The problem I see is who will create it? That requires having the bad files in the first place to then pass them through the self encryption process to generate the list. Do you expect the international police to do it? Would they?
Governments could make them, and if governments fail to do that properly, then random safenet users who don’t like certain content could make them too.
Of course it requires having the bad files in the first place, so does scanning for them for law enforcement purposes. If no-one knew a chunk was bad, then it wouldn’t be considered unsavoury and nodes wouldn’t object to hosting it.
Thats the point Governments don’t made the worse of the worse list as its sometimes referred to, the International police made it with the aid of an international organisation that takes in the reports of illegal activity & images. Governments may make their own lists, but typically these are for political reasons and copyright reasons.
Safe for a long time will be unrecognised by these groups and we would not want government lists anyhow as these are very local and usually for censorship reasons. So then its up to the international police. And they do not make specialised lists and require cloud providers (& government lists) to use their one and only one list. I highly doubt they would make a “chunked” list as they might also see it as an attempt to eventually reconstruct something. Not to mention they also would not care about Safe for many years unless a report of illegal file/data at x.x.x.x IP address, and then its not Safe they are interested in but the actual PC and its owner.
Also they do not store the worst of the worst files since all they need is the hash of the file, so they couldn’t anyhow make a new list, they can only add to it. For new reports they hash the file first to check if its already there and only if not then it gets added and deleted without any of the intl police seeing the file.
As for having a node recovery password, as in, a password to unencrypt the encrypted decryption key for all the chunks on a node, I think the best option would be to prompt the node operator to input a recovery password during setup, but even if they choose to not have a password, then still create a fake encypted decryption key (just random bits) on the disk, for plausible deniability.
I would also suggest that the pass phrase is for all nodes started by the node manager but still have each node have its own random temp key for encrypting chunks
I highly doubt they would make a “chunked” list as they might also see it as an attempt to eventually reconstruct something.
What is a “chunked” list in this context?
then its not Safe they are interested in but the actual PC and its owner
I’m pretty sure that once the owner tells them the software automatically put the chunk there, and that what they did is being done by everyone who uses the software, they would go after the software too.
The list you suggested that node operators could subscribe to. It would be a list of hashes of chunks that make up the illegal files.
Also as I added to my post above they don’t keep the files anyhow since they only need the hash and so they can only add to their list. This prevents them making different lists specially for 3rd parties. They require anyone authorised to have the list to use the list in that form. In other words they would require Safe node operators to hash the whole file and check it.
Which is one reason it’ll be years. Also from my discussions and the time the AU government tried to introduce legislation requiring ISP to filter every packet looking for bad stuff to protect the children, the discussions and fact checking done over a few years by knowledgable people showed up a lot of things.
The International police would not be involved once the IP address is identified to an ISP (easy to do). It would be passed off to local police to seize the machine and investigate. The only time international police is involved is for large investigations into CP rings in order to organise local LEA for simultaneous sezure of machines and people all around the world. You read about it in the news with such headlines as 100 people arrested for CP production and includes lawyers, doctors, politicians etc etc. You know the stuff that grabs attention. These usually start off with undercover LEA working their way into the rings until enough is known to bust them.
This prevents them making different lists specially for 3rd parties.
That would prevent them from retroactively doing so. Some law enforcement agencies do keep the material I think.
Regardless though, once the network is out, they’ll have the ability to make lists of safe chunk hashes. If they don’t bother dealing with safe chunks, then nodes don’t have anything to worry about from a law enforcement perspective anyway.
For the above reason, fragments should be encrypted at multiple levels (client, node, temporary key) by default, as it can be assumed that any element that could be used to destabilization the network will certainly be used to try to discredit the SN in the eyes of users. Even if the network will initially more expensive or will not operate completely on its own, it will be crucial to achieve the highest possible reliability and data safety.
Long-established regulations e.g. in the EU suggest that any pretext to question the way SafeNet operates and stores data will be ruthlessly used by regulators to impede (or prevent) network operations, so don’t assume that something will resolve itself.
That’s fine. It won’t help large files though in terms of data persistence. If only one of those split files has a chunk missing then the rejoined file is still a failure. So if people want to have more secure large files, then they won’t split them.
In thinking about this more it seems the difficulty with this idea is how it could be implemented. The network AFAIK isn’t aware of the file size, so has no way to discretely change the copy number. Bummer.
Iirc, when we had this discussion last time, the verdict was to use an app to increase the replication count of your chunks combined with erasure coding. There are various ways this can be done. You get increased data persistance, but it costs more and takes longer to read/write. This way, if someone really wanted to, they could make 1000 copies of a chunk with ridiculous RS coding ratios.
"Example: In RS (10, 4) code, which is used in Facebook for their HDFS,[6] 10 MB of user data is divided into ten 1MB blocks. Then, four additional 1 MB parity blocks are created to provide redundancy. This can tolerate up to 4 concurrent failures. The storage overhead here is 14/10 = 1.4X.
In the case of a fully replicated system, the 10 MB of user data will have to be replicated 4 times to tolerate up to 4 concurrent failures. The storage overhead in that case will be 50/10 = 5 times"
Yeah, someday maybe nodes could support it in a hybrid method that combined redundancy with erasure codes. Probably still best to manage it at the app layer though… For example, creating a simpler parity chunk for every two data chunks in a file offers an improvement. Keep iterating that approach over and over and you get “weaver codes”, which may be very compatible with self encryption and the network. You could store all the chunk addresses, data plus parity, in a datamap and retrieve parity chunks if something ever went wrong with a data chunk. If only I had to time be a real safe dev…
That’s fine. It won’t help large files though in terms of data persistence. If only one of those split files has a chunk missing then the rejoined file is still a failure. So if people want to have more secure large files, then they won’t split them.
That’s incorrect, a partial rejoin would still be possible, which is better than nothing in most cases and would therefore help. A sane client should therefore pre-split the files by default because there is no benefit to not doing so.
If I’m understanding the erasure proposal, that’s uploading slightly different chunks? So instead of 8 copies of 1 chunk, it would be 4 copies each of 2 interchangeable chunks? If so that’s a bad idea because it has the network efficiency of using 8 copies but with a much shorter lifetime, assuming node wiping events from time to time. It might be more complicated to implement higher copy numbers on network level, but it would be the better solution by far.
You are misunderstanding the issue. I wasn’t inferring that the file couldn’t be rejoined. I’m pointing out the inherent fact that large files are more suscepable to errors all else being equal. If there were a way to increase copy number as size increased, such would mitigate the increase in errors. So by splitting the file, you’d be opting out of that and hence …
But this is all moot IMO, as I pointed out that the network doesn’t have file-size awareness, so adapting copy number to filesize is a non-starter.
The simplest way is to replicate the fragments in the nodes that were disconnected from the network e.g. after the power failure. David wrote that replication of these nodes will happen anyway (although it may be halted for a while if the outage was short lived), and restoring a node with data is an over-planned replication of copies.
Granted, this does not mean that instead of 7 copies in the network there will be, say, 10, but due to the fact that larger files have more parts that are stored in more nodes - the number of additional copies resulting from randomly restoring nodes will result in a greater chance of proportionally more copies of parts of large files.