This is from a discussion on reddit: MaidSafe is safe for whistleblowers? I said I’d ask here and get back with an answer which I have done. Reddit and this post have been updated to reflect everying up to and including reply #18.
The question is, how could an attacker discover the identity of someone who stores something on the network, either in a public share (cf. WikiLeaks), or shared with select others (e.g. a handful of journalists, cf. Snowden)?
What Is The Aim Of The Attack?
The attacker seeks to know the identity of the uploader, and the content of the file uploaded.
Let’s equate the first with the IP of the node used to upload the file. A sophisticated whistleblower could ensure this isn’t enough, but lets assume someone on SAFE uploads from a machine that can be linked to them.
The second requires the attacker to know that a file originated from the machine identified with the whistleblower.
So an attacker needs to know: 1) The IP address of the machine running the uploading SAFE node, and 2) That a particular plaintext file was uploaded from that machine.
1) Discovery of IP Address.
The IP is I think only known to the four nodes directly connected to the uploading node, so the attacker must control one of these nodes.
For a large network this will be impossible to guarantee, but to succeed, the chances of discovery should be very very small, or whistleblowers will easily be discouraged, or have to take additional steps (such as shielding their real IP or ensure they are not associated with the machine used).
It seems to me that on its own, this is not enough. We also need to ensure that data passing through the four “gateway” nodes cannot a) be decrypted (I’ll take that as read), and b) cannot be linked to a known plaintext. This is part 2). Unless 2) is very hard, it seems very easy for an attacker to discourage whistleblowing by discovering even one significant target.
2) Matching Chunks To Plaintext
So I think it comes down to whether or not an attacker which has a given plaintext can use it to identify the chunks of data flowing through one of the gateway nodes, and so identify the IP of the uploading node.
UPDATE: If the attacker has an exact copy of an uploaded file, this attack is feasible. If the file is modified in the slightest, zipped into an archive for example, then the attack becomes infeasible unless the attacker can decrypt the files having somehow obtained you private key. That of course is a different attack - involving targetting of the client machines directly, since private keys are not shared beyond this.
SAFE therefore needs to consider making it very hard for a whistleblower to upload an unmodified file onto the network, without realising this might expose them under extreme circumstances, which may be avoided with a simple measure. See this post.