What can anyone know about a file?

So let’s say someone has the hash of a particular MutableData or Data. So they request it, but they don’t actually have the keys to decrypt it. What can they find out about the returned data?

  1. Something exists at that id

  2. The approximate size of whatever is there

What else?

Not even that I guess. Files are chunked into pieces before being uploaded. So they might get a 1 mb. file by requesting its hash. So they have something like:

  • The knowledge that a certain file exists and it’s hash.

No way to decrypt it without the datamap of the complete file. And as every 1 TB. of data on the network leads to at least 1 million chunks of data it’s quite hard to learn anything I guess.

Mutable Data might even return different files at different times. Imagine someone working on a .doc-file and using autosave every 5 minutes.

5 Likes

I mean if it’s under 1MB

Anything above 3KB and less than 3MB in length is split into 3 chunks using self encryption. Greater than 3MB then it is split into 1MB chunks using self encryption.

This means that to have a single chunk for a file then it has to be less than 3KB


Now it has been discussed that somebody could self encrypt a file independently of the network and use the chunks generated to know the hash (XOR) address of the chunks and then see if they are there.

This would mean in theory someone could find if a particular file has already been uploaded by checking the XOR addresses of the chunks.

Some even suggested that this is a way they could save themselves the PUT cost of files that have potential to already be uploaded. (eg a popular cat video)


Apart from that it is pretty as much as @anon40790172 said.

5 Likes

If same chunk is already stored 8 times where will be stored another? Like 9th. copy and later overwrite ?
How you will get access to same chunk already stored with out uploading it again ?

The hash of a chunk is its XOR address, So if you try and store the same chunk again then the network will not store anything since the section in charge of that XOR address (the 8 copies) will know the chunk is already being stored.

There is no issue. DeDuplication is automatic. The network will not re-store a chunk that is already stored. If you do ask the network to store a chunk that exists then it will still cost the PUT charge since you made the network do work trying to store it. Also it helps anonymity/security and helps the economics since popular content being uploaded will help pay the farmers retrieving the content.

2 Likes

Wait, is self-encryption somehow replicatable? Most encryption produces a different result every time you run it on the same data. To find out a consistent function of content, you use hashing.

Yes, the password and IV are determined from the hash of other chunks in the file.

3 Likes