Calculating the probability of data persistence

to7m · February 7, 2024, 12:26am

The problem I see is who will create it? That requires having the bad files in the first place to then pass them through the self encryption process to generate the list. Do you expect the international police to do it? Would they?

Governments could make them, and if governments fail to do that properly, then random safenet users who don’t like certain content could make them too.

Of course it requires having the bad files in the first place, so does scanning for them for law enforcement purposes. If no-one knew a chunk was bad, then it wouldn’t be considered unsavoury and nodes wouldn’t object to hosting it.

neo · February 7, 2024, 12:40am

Thats the point Governments don’t made the worse of the worse list as its sometimes referred to, the International police made it with the aid of an international organisation that takes in the reports of illegal activity & images. Governments may make their own lists, but typically these are for political reasons and copyright reasons.

Safe for a long time will be unrecognised by these groups and we would not want government lists anyhow as these are very local and usually for censorship reasons. So then its up to the international police. And they do not make specialised lists and require cloud providers (& government lists) to use their one and only one list. I highly doubt they would make a “chunked” list as they might also see it as an attempt to eventually reconstruct something. Not to mention they also would not care about Safe for many years unless a report of illegal file/data at x.x.x.x IP address, and then its not Safe they are interested in but the actual PC and its owner.

Also they do not store the worst of the worst files since all they need is the hash of the file, so they couldn’t anyhow make a new list, they can only add to it. For new reports they hash the file first to check if its already there and only if not then it gets added and deleted without any of the intl police seeing the file.

to7m · February 7, 2024, 12:41am

As for having a node recovery password, as in, a password to unencrypt the encrypted decryption key for all the chunks on a node, I think the best option would be to prompt the node operator to input a recovery password during setup, but even if they choose to not have a password, then still create a fake encypted decryption key (just random bits) on the disk, for plausible deniability.

neo · February 7, 2024, 12:43am

I would also suggest that the pass phrase is for all nodes started by the node manager but still have each node have its own random temp key for encrypting chunks

to7m · February 7, 2024, 12:46am

I highly doubt they would make a “chunked” list as they might also see it as an attempt to eventually reconstruct something.

What is a “chunked” list in this context?

then its not Safe they are interested in but the actual PC and its owner

I’m pretty sure that once the owner tells them the software automatically put the chunk there, and that what they did is being done by everyone who uses the software, they would go after the software too.

neo · February 7, 2024, 12:59am

The list you suggested that node operators could subscribe to. It would be a list of hashes of chunks that make up the illegal files.

Also as I added to my post above they don’t keep the files anyhow since they only need the hash and so they can only add to their list. This prevents them making different lists specially for 3rd parties. They require anyone authorised to have the list to use the list in that form. In other words they would require Safe node operators to hash the whole file and check it.

Which is one reason it’ll be years. Also from my discussions and the time the AU government tried to introduce legislation requiring ISP to filter every packet looking for bad stuff to protect the children, the discussions and fact checking done over a few years by knowledgable people showed up a lot of things.

The International police would not be involved once the IP address is identified to an ISP (easy to do). It would be passed off to local police to seize the machine and investigate. The only time international police is involved is for large investigations into CP rings in order to organise local LEA for simultaneous sezure of machines and people all around the world. You read about it in the news with such headlines as 100 people arrested for CP production and includes lawyers, doctors, politicians etc etc. You know the stuff that grabs attention. These usually start off with undercover LEA working their way into the rings until enough is known to bust them.

to7m · February 7, 2024, 1:18am

This prevents them making different lists specially for 3rd parties.

That would prevent them from retroactively doing so. Some law enforcement agencies do keep the material I think.

Regardless though, once the network is out, they’ll have the ability to make lists of safe chunk hashes. If they don’t bother dealing with safe chunks, then nodes don’t have anything to worry about from a law enforcement perspective anyway.

Profess · February 7, 2024, 1:37am

For the above reason, fragments should be encrypted at multiple levels (client, node, temporary key) by default, as it can be assumed that any element that could be used to destabilization the network will certainly be used to try to discredit the SN in the eyes of users. Even if the network will initially more expensive or will not operate completely on its own, it will be crucial to achieve the highest possible reliability and data safety.

Long-established regulations e.g. in the EU suggest that any pretext to question the way SafeNet operates and stores data will be ruthlessly used by regulators to impede (or prevent) network operations, so don’t assume that something will resolve itself.

TylerAbeoJordan · February 7, 2024, 1:59am

That’s fine. It won’t help large files though in terms of data persistence. If only one of those split files has a chunk missing then the rejoined file is still a failure. So if people want to have more secure large files, then they won’t split them.

In thinking about this more it seems the difficulty with this idea is how it could be implemented. The network AFAIK isn’t aware of the file size, so has no way to discretely change the copy number. Bummer.

jlpell · February 7, 2024, 2:24am

Iirc, when we had this discussion last time, the verdict was to use an app to increase the replication count of your chunks combined with erasure coding. There are various ways this can be done. You get increased data persistance, but it costs more and takes longer to read/write. This way, if someone really wanted to, they could make 1000 copies of a chunk with ridiculous RS coding ratios.

"Example: In RS (10, 4) code, which is used in Facebook for their HDFS,[6] 10 MB of user data is divided into ten 1MB blocks. Then, four additional 1 MB parity blocks are created to provide redundancy. This can tolerate up to 4 concurrent failures. The storage overhead here is 14/10 = 1.4X.

In the case of a fully replicated system, the 10 MB of user data will have to be replicated 4 times to tolerate up to 4 concurrent failures. The storage overhead in that case will be 50/10 = 5 times"

TylerAbeoJordan · February 7, 2024, 2:40am

haha … I was just researching that and returned to suggest erasure coding. Seems like maybe the best solution.

jlpell · February 7, 2024, 2:53am

Yeah, someday maybe nodes could support it in a hybrid method that combined redundancy with erasure codes. Probably still best to manage it at the app layer though… For example, creating a simpler parity chunk for every two data chunks in a file offers an improvement. Keep iterating that approach over and over and you get “weaver codes”, which may be very compatible with self encryption and the network. You could store all the chunk addresses, data plus parity, in a datamap and retrieve parity chunks if something ever went wrong with a data chunk. If only I had to time be a real safe dev…

TylerAbeoJordan · February 7, 2024, 2:58am

I was just poking around and found this one:

An app for uploading could work one of these in.

edit: better link: zfec · PyPI

neo · February 7, 2024, 2:59am

Well that is discussed above in this topic as well. Great minds think alike, fools seldom differ and all that

to7m · February 7, 2024, 11:57am

That’s fine. It won’t help large files though in terms of data persistence. If only one of those split files has a chunk missing then the rejoined file is still a failure. So if people want to have more secure large files, then they won’t split them.

That’s incorrect, a partial rejoin would still be possible, which is better than nothing in most cases and would therefore help. A sane client should therefore pre-split the files by default because there is no benefit to not doing so.

to7m · February 7, 2024, 12:03pm

If I’m understanding the erasure proposal, that’s uploading slightly different chunks? So instead of 8 copies of 1 chunk, it would be 4 copies each of 2 interchangeable chunks? If so that’s a bad idea because it has the network efficiency of using 8 copies but with a much shorter lifetime, assuming node wiping events from time to time. It might be more complicated to implement higher copy numbers on network level, but it would be the better solution by far.

TylerAbeoJordan · February 7, 2024, 12:04pm

You are misunderstanding the issue. I wasn’t inferring that the file couldn’t be rejoined. I’m pointing out the inherent fact that large files are more suscepable to errors all else being equal. If there were a way to increase copy number as size increased, such would mitigate the increase in errors. So by splitting the file, you’d be opting out of that and hence …

But this is all moot IMO, as I pointed out that the network doesn’t have file-size awareness, so adapting copy number to filesize is a non-starter.

to7m · February 7, 2024, 12:11pm

It won’t help large files though in terms of data persistence.

This is untrue. It would help. It sounds like you’re saying it either completely solves the issues (impossible) or it’s useless.

Toivo · February 7, 2024, 12:14pm

One possibility to kind of double your chunks, is to make a slight change to the first chunk, like here:

Profess · February 7, 2024, 12:59pm

The simplest way is to replicate the fragments in the nodes that were disconnected from the network e.g. after the power failure. David wrote that replication of these nodes will happen anyway (although it may be halted for a while if the outage was short lived), and restoring a node with data is an over-planned replication of copies.

Granted, this does not mean that instead of 7 copies in the network there will be, say, 10, but due to the fact that larger files have more parts that are stored in more nodes - the number of additional copies resulting from randomly restoring nodes will result in a greater chance of proportionally more copies of parts of large files.

Topic		Replies	Views
What are the chances for data loss? Beginners	72	5131	January 27, 2016
What about a catastrophic event that wipes out millions of nodes Features	98	6159	February 15, 2018
Storage proceeding Beginners	49	4945	October 5, 2015
Are Erasure Codes (Storj) better than Replication for the SAFE network? Features	109	6644	May 9, 2019
Grouping vaults by owner, to strengthen redundancy? Features	30	3767	November 29, 2014

Calculating the probability of data persistence

Related topics