Surveillance of "chunks of interest"

Until recently I was under the misguided impression (my own fault, for not reading the relevant part of the docs) that all data uploaded to the network was e2e encrypted, with the uploader’s own private key, which would prevent any kind of network-wide chunk de-duplication.

I now understand that this is not the case, and file encryption uses the file’s own hash as key (to put it simply). This means multiple users uploading the exact same file will result in chunk de-duplication. Neat!

However this also means that an actor in possession of a file, can tell whether that file has been uploaded to the network, even if they don’t know who uploaded it.

My question is: is it not theoretically possible that such an actor (I’m thinking state surveillance) can run enough nodes so that they will likely host “chunks of interest” and therefore can trace the IP address of users that hit their surveillance nodes with PUT/GETs of those chunks?

4 Likes

Can that not be obviated by adding random zeroes to start/end of each file you upload that may “attract interest”?

4 Likes

Yes, a single bit change would result in completely different hash and therefore chunks.
Pre-encrypting the file with a user’s own key would also work.

But these are steps that might not be clear to a user who just reads the marketing materials and doesn’t have a deeper understanding of the underlying mechanics. (I myself was under the wrong impression until today)

It should be clear that those in need of total privacy and non-traceability, that they need to take additional measures to remain fully private.

Or there is something else that I missed, which is what I’m hoping for.

1 Like

Absolutely.
It cannot be stressed enough that Autonomi is not the complete answer to those who have something to hide (for whatever reason) but it is a very major step forward - if used correctly

Defining “correctly” above is of course another discussion/

5 Likes

I think to be able to trace the IP you have to win the lottery and have control over one of those specific nodes that host the chunk.

4 Likes

Has to be at start, at the end I am not so sure. I uploaded to a test network with small files (18 bytes with leading spaces on a counter) and for the 100,000 files I uploaded one chunk was the same across all of them and of course the same xor address for that chunk. The start and middle sections of every file was exactly the same (ascii spaces) and only the last 6 bytes were different

Files can also contain same chunks even though the file is different. Programs with loads of libraries may see common chunks. You changing a document may see some chunks remaining the same.

To answer the OP
Yes with the current stage of development the node runner can see the IP address of the uploader of the chunk.

BUT the node runner does not get to choose the xor address their node is located at, so its random positioning.

As the network grows it will be harder and harder to be lucky enough to have a node at the xor address you want one at.

Some hard figures, at this time with less than 1000 people (actually less than 500) we had over 60,000 nodes running. Now lets say those same less than 1000 (less than 500 actually) people only run 20,000 nodes long term. (20 average/person) then when beta is ending we are expect 10,000 people, that would be 200,000

Now when adoption kicks in we’d expect at least 50,000 people within 6 months to be using Autonomi, any less and we haven’t done something right (promotion etc). At that point its one million nodes. If you say the abcs will run 10s of thousands of nodes then I say that some people won’t be sticking with the small 20 nodes per person. But they will have a thousand as they have now.

At 10,000 to 1 million the abcs have a 1 in 100 chance of getting one node close to one chunk they want to watch.

Thus as @Erwin says it’ll become a lottery

And as @Southside says and I tried to say last night, its not Autonomi’s place to solve all the security concerns involving computers/communications.

What you see as a problem is only one thing and another person wants security for their surveillance cameras to stop people snooping on their cameras, and another wants to stop hackers putting code on their computer with sandboxing of the client app, and another 1000 wants their version of the 1000’s of encryption methods used and another wants xyz security.

So where does it stop.

If you go way back (decade or more) in development it was always said that its not perfect security, which even the ABC security agencies cannot do for their agents using billions of $$$, but it is security in numbers and the more people running nodes the more of a lottery it becomes for any agency to be able to monitor specific chunks.

The role of closing the security gap for the 10’s of thousands of different security features desired is to write APPs for them and have that done in the APP layer where you can do your specific security concern. That is what Autonomi is designed for. To allow these security features to be built in a secure way

And your example of the student uploading illegal material. Ever thought that the easier way to catch them is not spending millions to billions running 1/4 the nodes of the network, but to pay off another student to rat on them. 1000 dollars is cheaper than a huge highly imperfect surveillance system. No amount of security in Autonomi will protect your student

And again Autonomi was never designed to do illegal things on it, but to return old style privacy You will always be subject to old style policing, like they do for the current encrypted networks criminals use now.

Autonomi removes the the layers of surveillance that have been built up on the current internet with its client APP example and the allowance of running APPs to provide specific protections and not farm, report back home, etc. It is not for illegal activities and you’ll be caught the old fashion way a lot cheaper than the ABCs trying to control 1/4 of the nodes.

It cannot protect you from yourself and how you deal with your illegal material onboarding and offboarding. No need to watch chunks and be lucky enough to have a node that will hold the chunk you are trying to watch.

2 Likes

To address this one point. Churning will also be causing GETs on those chunks as new nodes join.

And the potential is always there that the original PUT onto your lottery winning node could be a churn event and it simply came from another node.

So not only do you have to be lucky enough to have one of your nodes running in an expensive cloud setup of 10’s of thousands to millions of nodes, but be lucky enough to have the node remain one of the 5 closest nodes to the chunk address of interest and then on top of that be there when the uploader uploads the chunk and then the uploader chooses your node to upload to.

so the odds of your ABCs nodes catching the uploader in the act relies on a set of odds

  • odds of even getting a node that is one of the 5 closest nodes to chunk of interest (very low odds, and depends on money spent to run nodes, very expensive to get close to 1 in 100)
  • odds that it will remain one of the 5 closest nodes at the exact time the uploader is uploading the chunk of interest (probably same sort of odds as above maybe 1 in 25 if above is 1 in 100)
  • odds that the chunk of interest is unique to the file of interest (yes this one is close to 1 to 1 but still isn’t).
  • odds that your node is chosen to upload to. Need that for proof. This is always 1 in 5
  • odds that the uploader isn’t using a VPN that has a no logs policy (the norm since logs cost a ton of money to keep and they only keep the ones with a court order to keep) Odds here that they are using their ISP IP address is not good if the uploader is uploading illegal stuff, so maybe 1 in 5 they are using their ISP

So even if the ABCs spend mega $$$ to get to 1 in 100 odds of getting a node close enough to enough chunks of interest, the next is like 1 in 25 and the last is 1 in 5 (ignore the near unity odds of uniqueness). And 1 in 5 for those chunks for not using a vpn

That is 1 in 62500 chance after spending mega dollars. Cheaper to watch the on boarding and off boarding of illegal material and follow the trail that most leave. So unless the ABCs budget can pay the data centre costs to have at least 1/5 of the nodes existing it is not worth wasting money on. Have a think of the costs of doing that even for a small network of one million nodes.

But as always this is the job of APP to provide the different security features you and thousands of other people want. That way the 1000’s of different security features can be catered for.

But to expect all that at launch is to push back development many years and cost a lot. Best to have the vehicle for these APPs to be built on out there and 1000’s of app developers can build them in a short time.

Your desire for double encryption is simple

your-flavour-of-encryption $1 > $1.ency; safe files upload $1.ency

Put that line in a script file and call it say myupload.bash and call it with myupload.bash my-illegal-file

then wait for someone to rat on you and the police knock on your door.

2 Likes

And again Autonomi was never designed to do illegal things on it, but to return old style privacy You will always be subject to old style policing, like they do for the current encrypted networks criminals use now.

Sounds contradictory to me.

autonomy /ô-tŏn′ə-mē/

noun

  1. The condition or quality of being autonomous; independence.
  2. Self-government or the right of self-government; self-determination.
  3. Self-government with respect to local or internal affairs.

“granted autonomy to a national minority.”

  1. A self-governing state, community, or group.
  2. The power or right of self-government; self-government, or political independence, of a city or a state.
  3. The sovereignty of reason in the sphere of morals; or man’s power, as possessed of reason, to give law to himself. In this, according to Kant, consist the true nature and only possible proof of liberty.
  4. Self-government; freedom to act or function independently.
  5. The capacity to make an informed, uncoerced decision.

If you use the definition of a word then yea expect differences. Autonomi was not designed around a word.

Also where in the definition that says something helping to be autonomous has to also allow you to, nay to assist you and protect you when doing illegal things?

Autonomi is a vehicle that allows people to build more secure (& autonomous) apps upon it. It is not a all in one criminal program

EDIT: also if you think “old style policing” has anything to do with the internet then you are young indeed. It is doing the leg work, getting into the on boarding and off boarding of the illegal material, paying off associates to rat on you, and a lot of other things like that.

1 Like

If churn related GET/PUTs are indistinguishable from user download/upload GET/PUTs then that would create plausible deniability and would basically solve the problem.
However I believe a node operator might be able to make that distinction, since a counterpart network node would also answer to GET requests (and potentially other node related APIs), whereas a user app would not.

Let me be clear, I am not suggesting that the core protocol should solve this issue at this stage.

What I am drawing attention to is that we should not mislead users into a false sense of security when marketing the product. Stuff like:

Anything stored on the Network is encrypted in a way that not even a quantum computer can break it

no one can watch or track your use of the Network either

Nodes cannot determine the content, nor origin of data they hold or transmit, even if it is their own

Again, this is not a major issue if we just make it clear to users that, in some circumstances, they may want to add a bit of extra protection to their data by pre-encrypting it with their own private key, and in Discord I was strongly suggesting that this should be a must-have in every autonomi app, making it not only visible but also convenient to users. The only downside is that such data would likely never get de-duped, which is fine because the user paid for all the PUTs.

It’s all about setting expectations and all it takes is a checkbox on an app UI.

1 Like

They can if they were the chosen one to upload to. I said as such when I showed the sort of odds involved in the ABCs being able to see it.

Let me be clear too. I am not saying it shouldn’t be done. But be clear that its not the responsibility of Maidsafe or any other Autonomi developer to do it either.

And to be doubly clear, I want Application developers to develop all sorts of great privacy and security Apps for the Autonomi network.

And reiterate that the Autonomi network is the vehicle for all these privacy and security Apps to be built on.

BTW hiding the uploader from the chunks written is one App away for a talented programmer and that can happen when the Autonomi network exists to develop on. And I even gave you the App as a script. Just replace the program with your chosen encrypter. LOL its too easy for App developers to do it for you. And I used the Autonomi supplied programs as the vehicle to do it

Kinda was, just a different word.

What happened to the idea of private vs public data where you encrypt your own data with your own key and only you can decrypt that private data? vs public data which is still encrypted but the keys are publicly accessibly? What did I miss here?

That didn’t exist where you supplied a key for your private data.

I am not sure if private data is and I think it is meant to be, encrypted with your key in your account record. But since account records are not a thing yet (in the releases) that kinda isn’t possible is it. The coding for accounts I gather is ready, but they need to get testing of the infrastructure done before testing that.

But as I showed @ktorn the app to do your own encryption is a single script line that can be stored in a file and run. And the advantage of that one is you get to choose the encryption method and algo and supply your key in its cfg.

The reason why it isn’t done in the example client App is that seriously there is 1000’s of encryption methods and programs that people swear by. To use one for everybody is to piss off 99.9% of the others who wanted their one of the thousands out there. Its a no win scenario.

Thats why its a case of get the vehicle for all the Apps to run on up and running, with some simple examples and let the talented people out there program up Apps that people can choose between to provide all the security, encryption for their files and camera feeds and messaging and 100 other things. There is no way we’d get Autonomi if the dev team had to do this just for perception sake.

1 Like