Unencrypted Data Question

cretz · June 6, 2016, 4:49pm

Pardon my naivete here, but can someone help me understand something? How, if data is unencrypted, can a node be unaware that it is storing a piece of that unencrypted data? Basically, the encryption/decryption has to be deterministic right or does self encryption even occur at all? Therefore, if I fetch something to reassemble it, I can look at the pre-assembled blocks and see what they look like to know whether I have any of those blocks?

Again, sorry if I don’t have a clear understanding on how unencrypted (i.e. public) data works. The basic question is how to prevent a node from knowing he might be storing a piece of a certain set of unencrypted data. Thanks.

dirvine · June 6, 2016, 4:59pm

Immutable data is encrypted but uses a data map as it’s keys. S an unencrypted data map (public data) points to encrypted chunks. Hope that makes sense.

There is an option we have not put in place yet so that even nodes with encrypted data cannot use rainbow tables or similar type detection to know what they are. A simple way to think of it is the vault creates a random session key and stores all data + name encrypted with the session key. On disk then you have encrypted → encrypted data.

cretz · June 6, 2016, 5:06pm

It does make sense, thanks. I still don’t really understand, if everyone can fetch the content, how can they not fingerprint a block to see if they have it. A rainbow of known hashes to not store is kind of what I am talking about. Or are we talking non-deterministic encryption so when block rebalancing occurs, I can’t tell that a block is something I don’t want?

Again, pardon the confusion. I just feel like if anyone can fetch data, then it has to be deterministically available which means I could fingerprint it. (granted an orthogonal passive adversary problem exists of watching node traffic patterns based on fetch patterns, but that is a different problem if there is not artificial noise). The reason I ask is I am curious, at a high level, about public data systems that still gives the “storer” plausible deniability and am unable to get my head around how if anyone can get it, anyone can see what they get and surely determine they could be part of the providing side based on patterns.

dirvine · June 6, 2016, 5:10pm

This is the part to look at. There was a few huge threads on this. Everyone cannot see what you get unless they intercept your comms (route) and can determine who you are form a public key (you can use many keys, even throw away keys). A vault or instance has a different key every login and hold different data.

If you search though, there has been very long topics going through this one. Shout if it doesn’t show up, something like “how are uploads/downloads anonymised”

cretz · June 6, 2016, 5:15pm

Sorry, I think I wasn’t very clear. I am not worried about in-transit security, mitm, or any of that. I am worried that a user can deanonymize public data at rest (even if it is encrypted at rest w/ my or some other key, it is still “public” since anyone can request it over the network) by knowing he has it. So, for instance, say I want to know if I am storing any blocks related to safe://somespecificsite. Could I add some gdb breakpoints in the rust code that is pulling the blocks for reassembly and determine any information to see if I have that on disk somewhere? More important, could I put a tool together that anyone could run to see if they have any piece of safe://somespecificsite?

bluebird · June 6, 2016, 5:24pm

My understanding of this is quite superficial, but:

Isn’t there self-encryption, i.e. at the launcher, which still applies to data in and out of the client’s host even though it has been removed for movement within the client host? So your hostile vault never sees totally unencrypted chunks, just chunks with some layers of encryption removed. And your IP is only know to your close group. If there is any such attack vector there then I’d be interested to learn about it.

EDIT: Oh, I see you’re talking about publicly published data, that anyone can retrieve and presumably decrypt. Hmm, I don’t know, but I’ll leave this comment up as a testament to my ignorance.

cretz · June 6, 2016, 5:31pm

It’s not a hostile vault. It’s a completely willing vault that just wants to know if he has some set of public data. In a tiny network this is easy enough by just watching your traffic patterns while you request it in a patterned approach via a different machine. And it’s not really about IP or deanonymizing (I shouldn’t have used that word). It’s about plausible deniability removal. The simple question: is it impossible for me to write a tool that you could use to see if you have a piece of something when you run the tool on your vault? Sure you may encrypt stuff locally and it may only be a small piece of the data, whatever. Basically, I want to check plausible deniability to avoid people wanting to run something like the my-vault-has-wikileaks-chunks.exe.

I am asking here at a high level, but I can spend a weekend working through the code to attempt to write a tool that will try this. What I’d do is make a safenet site. Then inspect everything I can about the storage and retrieval (even on my own standalone network) to understand everything about storage I can’t glean from the code itself. Then I’d attempt to put together a way for a user to check if they have any of my safenet site. Just curious if this is even possible.

Part of my research here is trying to understand how systems purport to store public data but still not have the “storers” of said data know they have it. It’s not specific to Maidsafe nor am I on a fishing expedition, I am just mired in deep thought on this subject at the moment (distributed, plausible deniable public data). I’m just taking the lazy way out by asking on the forum.

dirvine · June 6, 2016, 5:35pm

IF the data was encrypted via AES or similar from a random session key you possibly could, but it would be a feat and then would mean getting folk to disassemble the vault process (while running, with gdb etc.) to find the session key etc. So yes I would say this can be done, but would be a specialist tool perhaps? It may be possible with effort, if data is moving though it may be more difficult. Sort of like finding a hash in the blockchain points to “bad data” sort of thing, a point to data easily retrievable. I am not sure - brainstorm mode (As usual)

cretz · June 6, 2016, 5:40pm

Yup. I just want to know if I can make a “scary-chunk-hash-watch.exe” and maintain a hash list (or some other ruleset) that could tell a vault that he might have a block of something. Granted I only ask this for public data, obviously private/encrypted data is “zero-knowledge”. I have been wanting to build something on a distributed storage system, but for me to provide the guarantees I want to provide, it has to be readable-by-everyone-but-nobody-knows-if-they-have-it. I have never seen this property ever…I’m not even sure it’s possible. But I’d be willing to settle for non-global-passive-adversary approach. But I’m not even sure that is possible. Even ignoring maidsafe, have y’all ever heard of a system that has public data for everyone but plausible deniability for the decentralized “storers”?..I too am in brainstorm mode…

bluebird · June 6, 2016, 5:51pm

Suppose you can find out if you’re hosting a chunk of a particular site. What are the consequences? It might not matter, even if feasible.

Plausible deniability implies defending yourself against accusation that you’re hosting the chunk. But the accuser first of all has to monitor your vault from outside, find the “incriminating” chunk and then pin a charge of malfeasance on you. Seems unlikely.
You decide to remove the chunk from your vault. It then gets duplicated somewhere else. You’re only going to be able to remove it completely by controlling most of the vaults on the network. Seems unlikely.

Conclusion: Preventing such knowledge is another, desirable layer of obfuscation, but not something the absence of which would break anyone’s use of the network.

dirvine · June 6, 2016, 5:51pm

Not really, I suppose it would require a degree of dynamism in the data you are trying to get somehow?

A few tings can be done in the binary that stores, such as obfuscation.

A mechanism that offer something like that would be the DataManagers (in SAFE thinking) store on a node data encrypted so the node cannot know what it is, but they hols an agreed “key” to decrypt the data and a map of actual verses encrypted name. On request they ask the storer for the encrypted name. They decrypt en route etc.

Then the attack would be getting into the group that is data managers and reporting this back to the storing node. So much more difficult to crack perhaps, but still not 100%. I am sure there are even more like this that cna be used. In our c++ impl we did do something similar whee you only knew what you held when somebody asked for it (meant using the request as decrypt for data held). So it’s likely possible?

The question then may be is plausible deniability where an autonomous network stores encrypted chunks that you need specialist tools to know (as well as the list of course). It’s a good subject area for sure with many edge cases and probably as many ways around it (if we can see them)

bluebird · June 6, 2016, 6:00pm

There’s encrypted usenet groups such as alt.binaries.archive.encrypted, whose content is unknown to the hosting servers. Not exactly public, though, since the decryption key is distributed to some group of file-pirates or criminals.

EDIT: The problem is quite interesting, in that it seems to be a mirror image to the Byzantine Generals problem, which did turn out to have a solution (an append-only public ledger, i.e., blockchain). While the Byzantine Generals had the problem of acquiring certainty of knowledge (the communication between them over hostile territory), you want to have certainty of non-knowledge.

EDIT1: I can’t prove this but I suspect that plausible deniability is a property of the same kind as anonymity: It is not a binary quantity but a matter of degree, and that the goal, as in the case of creating anonymity, should be to make the search space uneconomically large for the attacker (both third-parties such as the government as well as the owner of the computer hosting a chunk from a forbidden site). In the case of anonymity, it is deemed sufficient when the crowd that an attacker would have to search through to find you becomes big enough to make the task impractical. So, fruitful thinking on the problem of plausible deniability would be to consider ways of making the cost of a tracing the origin of a chunk unacceptably large. One way to do that (to think about it) would be to consider what other types of data on SAFEnet are impractical to trace, and why. I don’t have enough of a detailed grasp of the working of the network (the mental picture is too fuzzy) to be more specific at this time.

cretz · June 6, 2016, 6:03pm

I have been doing some research here on plausibly deniability and I just don’t think anyone has cracked it.

For freenet, this SO post claims “It is hard, but not impossible, to determine which files that are
stored in your local Freenet Datastore.” and similarly Plausible Deniability · freenet/wiki Wiki · GitHub says “Of course, the decryption keys, which are contained in links to the files, may be publically posted on some other site”…ug.

And so many systems don’t even offer anonymous storage. I will read up on others like storj, but I highly doubt technology has come far enough to solve the problem of allowing everyone to decrypt whole things but not know if they have a piece locally. It was just something weighing on my mind. Granted I can almost guarantee any public-data system sufficiently large enough with content some people don’t like will contain these kinds of tools (granted the resistance is to punish vaults that refuse to store/serve).

EDIT: Reviewed Storj, they have a concept called “shard graylisting” as explained here. This is exactly the type of tool I was talking about (making available hashes of content the “storer” may not want) and the section of that link explains what I mean well. Seems the basic answer is: “If you make public the ability to decrypt some data (in effect, making the data public) then you cannot prevent someone from knowing or finding out whether they are storing some/any of that data” (at least with modern technology). Here are some IPFS notes not the subject where they even plan to maintain these kinds of hash lists.

cretz · June 6, 2016, 6:04pm

Meh, not worried about punishment or laws or any of that to me personally. This is more about adoption and marketing words I can use to promote my project.

neo · June 7, 2016, 12:11am

This or something to achieve the same results, would be good to prevent people examining their vaults to see if they have any chunks from any particular public files.

The “lets censor the bad bits” brigade will jump on that possibility and generate behaving vaults that also search for these chunks. Then communicate with each other so that when they find 4 or more copies of a particular chunk they delete it simultaneously from their vaults.

Yea I know its unlikely to get enough people to participate and succeed.

But the problem is deeper than that because some places (Australia cough cough) can jail you on the basis of storing just one chunk of an illegal file, even if you didn’t know. Hell they can jail you for years because of something sent to you without your knowledge as happened to at least one person. If it can be decoded then they could sensibility be assumed to be able to.

Again its a long shot, but with places like china, Australia where the police can randomly seize all your computers/storage devices on a “pre-arrest” without even charging you it is a serious consideration. China worse than Australia obviously, but Australia is catching up too quickly

happybeing · June 7, 2016, 12:30am

Not dismissing your concerns, but I think when governments want to imprison people arbitrarily, they can easily find a way and technology isn’t going to help much.

neo · June 7, 2016, 12:43am

Agreed, thats why its a long shot.

But to counter what you/I just said. Our police are divided into groups and each is looking to legitimize their existence. Currently the technology centred departments are trying to grow and legitimise their existence. In doing so they have applied and had far reaching laws enacted and seen a rise of arrests over technology based offenses. Not government harassment as such, just one or two of their police departments.

A major technology reporter for a big newspaper in Australia was attending a hacker conference in Brisbane and when he left the conference was “pre-arrest” and all his possessions seized. They promptly grilled him for information and copied his laptop disk etc. After a number of hours he was released and his possessions returned. He wrote an article on what happened and how this “pre-arrest” can be used to arbitrarily harass/take your possessions. Under the law they do not have to return the items seized and you have to go to court to attempt to get them returned if the police will not return them.

tl;dr

It may not be a government trying to harass/arrest you, but it can be just one over jealous employee of one police department who claims you have something, the police don’t know but take his expert word. They do not have to prove anything, just single you out.

Why give them the chance, that is my point. It can be made so that even public chunks cannot be decoded by the vault, so in my view it is very good PR to do so and makes it near impossible for the “lets censor the bad stuff” brigade and authorities to show you stored anything in particular.

Honestly I thought they were going to be stored with a second encryption so that the vault would never know or could know what was stored.

cretz · June 7, 2016, 3:00am

I believe that is the case. My issue was more that even if encrypted locally at rest, you could still theoretically know you have a piece of something if it’s otherwise public. Turns out based on common knowledge that if the world can retrieve it, you can know if you have it. Modern crypto does not have a way correctly for you to not be able to figure out you have a piece of public data. Which is fine, we might get there one day.

neo · June 7, 2016, 3:20am

What David suggested is that the data sent to the vault is encrypted AGAIN before it is given to the vault to store with a key that only the data managers know. So public data would be self encrypted then passed towards the vaults to store then each vault does not receive for storage the self-encrypted chunk, but the self-encrypted chunk that has been encrypted again by the data managers

And this prevents anybody finding out the actual public data stored in their vault. The hash of the self-encrypted chunk is meaningless now since the chunk was encrypted again before being passed to the vault for storage

So @dirvine and devs have to decide to do this or something that achieves similar and we don’t get the “ban bad stuff” brigade or anyone else knowing you stored a chunk of some file they object too. In Australia this can be “notes from a legal hacker convention” as has already happened.

cretz · June 7, 2016, 3:43am

Doesn’t matter. If I can hash the result of deterministic encryption (i.e. convergent encryption) then I know I have a piece. Neither this project nor any other would make the foolish claim that you could not know you were storing a piece of public data. If this becomes a point of contention post launch, I’ll provide a POC showing that you can know you have a piece of public data.

Topic		Replies	Views
Data cacheing on Nodes Beginners	7	495	April 21, 2021
Public / Private Mutable Data Development	10	1433	August 19, 2017
Questions about Chunks Features	1	1241	August 10, 2015
Is the "data map" distributed? Beginners	3	831	March 5, 2015
Hi. I was wondering how maidsafe encrypts public data Features	11	1865	October 15, 2017

Unencrypted Data Question

Related topics