File names are not part of the file. You said they are identical files so they will be stored once
The process is that the file is self encrypted which results in chunks to be stored. Two identical files will produce exactly the same chunks. When chunks are stored an XOR address is derived from the hash of the chunk and the chunk is stored at that address. Thus 2 identical chunks will have the same address and when the second is attempted to be stored the section that handles that address will not store a chunk if it already exists.
The file name is in the directory structure and points to a datamap of the file. So you can use any name to dentify that datamap and it does not change the underlying file at all.
NO and the network does not know either. If you wanted to know if your file has been stored then run an APP that does the self encryption for you and returns the datamap without actually asking he network to store anything. Then attempt to read the chunks at the addresses in the datamap and if you can read them then those chunks exist
So Individuals/Organisations could essential check if pirated/illegal software/games/music/images etc. were on the network? They wouldn’t know who stored them or how many people stored the same file, but they will be able to tell if it’s on the network?
It’s not necessarily an issue, I’m just curious for my own understanding.
Yes it can be since you can self encrypt files on your own PC without SAFE. But the issue will be that if I want to upload a 1MB file am I going to waste time to check if it exists or pay the 3 PUTs to store it. I am sure many will consider it worthless to waste all that time on the off-chance it is already stored for every file they upload
Also if you increase the size to 4GB for say a DVDr image of your favourite vid, then how many are going to wait for a 4GB download just to check if the WHOLE of the dvdr file exists. Remember that its quite possible people upload only part of a file, even if its only to stuff up people who check for duplicates. They could upload only the first 50 chunks and the last 20 chunks of the whole self encrypted file.
tl;dr
Yes it certainly possible to write an APP that could attempt to download the file by first self encrypting it only on your computer to get the datamap.
BUT as I suggested there maybe plenty of scenarios where such an App may not be used by people.
I would use such an App if I have medium sized files that I suspected could have been uploaded already.
If I wrote the algo for checking existence of chunks, I would choose chunks from the map nondeterministically.
Whenever NotFound is returned upload would start, and so patching the file.
If a chunk starts download, it can be immediately cancelled. The address of it is the verification that it is not corrupted.
And of course running a fair amount of these in parallel.
And the chunking+encryption etc. of files, that is probably quite fast with human perception, if we’re talking everyday files.
So, not sure this would actually be such a hassle once abtracted away.
So, given enough horsepower, I could hashguess the contents of the entire network, then spend a few thousand lifetimes trying to reassemble them into something meaningful
You would need to have the actual files to do the above. You will most likely not have every file that exists, so you can’t check if they exist
But with the files you do have, sure.
Thats like saying that with enough grunt you can find every file that is hidden in the digit sequences of Pi.
It has been shown that Pi contains every data collection within its sequence (somewhere). And that all you need to do to share a file is give the starting digit (point) in Pi sequence of digits and the length.
BAN Pi I say Ban it, it is the most vile of all things ever. All those vile files are contained in it just waiting to be found by my little kids and it will warp their little minds. Won’t someone think of the children and BAN Pi
You still have to check every chunk. That is the only way to know if any are deliberately left out. Or because someone aborted their upload (by action or mishap)
Don’t think so. If the datamap are not public try to find a particular audio/video/file is extremely hard because small variances in the codec parameters, metadata embedded or in compression generate hundreds of different possibilities even using the same original file. And, in each case, the chunks to look for would be different.
And even if they did, what good does it do them. They have no way to find who uploaded it from a datamap they created by doing their own self encryption. Unless of course the uploader advertised the fact they did the upload.