SAFE versus passive surveilance

Tonda · September 25, 2015, 2:02am

How well does SAFE stand against global surveillance. Many if not all anonymity projects are vulnerable against an adversary with the ability to watch the entire internet. What if anything has SAFE done to prevent such a powerful enemy from circumventing the anonymity SAFE will provide? Are there any proposed solutions if one hasn’t already been applied to it’s current design? This question has been lurking in my mind for a while now. I’ve searched for similar threads but still haven’t found anything. I felt it might be an unexplored forum topic that could use some elaboration from more informed members or even developers.

smacz · September 25, 2015, 2:13am

I think it would be very helpful to explore how current global surveillance happens. Are you talking about PRISM? Or “Corporate assistance”? Or Backdoor insertion/exploitation? Or a nasty combination of all three?

Tonda · September 25, 2015, 2:20am

Think PRISM but with a greater scope. Imagine this adversary could watch the the entire internet passively (no backdoor or exploitation of the SAFE protocol). An entity that can only watch the entire flow of information on the web.

smacz · September 25, 2015, 4:01am

Well, my favorite aspect of the SAFE Network is one that tends to be overlooked, and that is that your IP address gets scrubbed at the first hop.

Secondly, all data is encrypted.

Third, no passwords are sent over the internet (encrypted or plaintext). They serve to hash to verify and retrieve data (I suggest looking into why it’s necessary to have both a PIN and a password)

Lastly, in any one account you are able to have 2-3(?) “personas” at any given time, and you can continue creating (cycling through) them. It’s like being able to generate throwaway accounts for the internet.

Tonda · September 25, 2015, 4:19am

All of that is great! Most of the protocol is known to me but I appreciate the feedback. Now to clarify before I settle down for the night.

I’m curious as to the defenses against timing analysis and correlation attacks. Since every file is 1MB in size, padding should not be necessary. All those connected to SAFE with vaults participate as relay’s to a degree which could help to mask personal network resource requests. Are these ideas enough to defeat the aforementioned attacks? Are there other solutions in play or in development?

Help me out here people. These questions are bound to be risen by others who audit the protocol sooner or later. Who knows, maybe some of this might make it into the FAQ.

anon81773980 · September 25, 2015, 4:21am

How does exactly IP address get scrubbed?

janitor · September 25, 2015, 5:14am

These questions are bound to be risen by others who audit the protocol sooner or later. Who knows, maybe some of this might make it into the FAQ.

Don’t be so concerned, Mr Original Thinker, the question was already asked, and answered, on this forum.

@anon81773980

How does exactly IP address get scrubbed?

It doesn’t. You just cannot tell which IP does what on SAFE unless you control a large majority of nodes and the target has very distinct traffic pattern which you can monitor in terms of flow. Very hard on a large network. Also hard to prove. If countermeasures are added it gets even harder.

zankfrappa · September 25, 2015, 5:30am

None of this is actually strong privacy protection the way a project such as I2P or even Tor works. I think it’s safer to say that there’s some pseudo-anonymity, but that against state-sponsored surveillance all bets are off. Tor is also weak when it comes to a state-level group monitoring traffic flows.

Tonda · September 25, 2015, 5:33am

Lol! You amuse me. If you read the OP, you would know that I stated there was a possibility that this had already been brought up. But, taking a jab in the dark is your specialty. You’re like a mole. I whack you in one thread and you pop up in another. Is there no end to this game. I want a refund…

—Loudspeaker—

Attention, attention, there is shit on the bathroom floor in the biology department. I repeat, there is shit on the floor!

Yo janitor, you’re being summoned. Stop auditing the class and get to it…

No doubt I left it there for you. I’m such a primate.

zankfrappa · September 25, 2015, 5:34am

I think Janitor must be a Stack Overflow mod…

Tonda · September 25, 2015, 5:36am

Lol!!!

Yup. You definitely got me thinking!

smacz · September 25, 2015, 5:43am

I believe it’s in crust where this gets accomplished. Unfortunately, I am not cogniscent on exactly how this occurs. However, it is mentioned many times by @dirvine in these forums.

Disclaimer: I am not sure this is how it works, and would welcome any critique or clarification that any devs would have to give

My best guess is that since the network operates on XOR space, (and not IP addresses like TOR), the first hop is to another IP address based on it’s closeness of it’s XOR location related to the piece of information it’s retrieving.

This first hop will then be to a vault, and all others after as well. Keep in mind that vaults are stateless, and only retain information for the time it takes for them to perform whatever process they need to do on it.

After the first hop, vault1 queries their DHT for the node closest to the hash of that piece, and sends the request on. Now that request (on vault2) is comprised of:

Hash of data requested
recipient XOR

That continues on and on until the vault is found with the piece of data. At that point, vault9 says “I have this piece of data, let me see who in my hash table is closest to the intended recipient”.

The path is not necessarily the same, because “Every computer has a different view of the network”. In reality, it is most likely going to be a completely different path.

Vault9 will then send it to vault8, who recieves, similarly:

data
recipient XOR

and on and on until the recipient is found and the data delivered.

So now that you (hopefully) see how the network works in XOR space, one final note about Kedemlia that may tie this all together. Here’s a graphic (video starts at correct time for reference). The DHT that’s held by all nodes does show the IP address of other nodes. The routing however, is done on a step-by-step basis.

This simple explanation may have holes that the maidsafe team has confronted and solved, but I am not aware of the details, and only hoped to convey an elementary view of the network…the only type that I am able to provide with my limited knowledge.

smacz · September 25, 2015, 5:55am

Just a couple thoughts:

Since data is split into chunks, there is no one piece of data being retrieved from one specific entity. That would make it incredibly hard to trace an actual file rather than just a chunk.

Chunks can also be de-duplicated, so who knows if that chunk is being requested for file1 or file2. Each may contain data that hashes to the same chunk, but there are two reasons for requesting it.

As far as timing attacks go, the path to retrieve the data, and the path to return the data are not necessarily (nor likely) the same paths.

Also, while a monitor at one end will see incoming data, there is no way to see from which vault it came from. Even the recipient doesn’t know any but the last hop it took. And since even one file will be coming from many different vaults, it would be unfeasible to correlate any timing attacks (IIRC)

Also, chunks are cached as well, so it may not be the original storage vault who ultimately returns the chunk, but rather an intermediary close to the chunk that has cached it.

@Tonda, to learn more about the network I would invite you to view @dirvine’s whiteboard speech. (even though it’s not great audio and he tends to let his thoughts wander around a bit. In fact, I would absolutely love it if he did another whiteboard explanation for the current state of the project and have time to go through all of the elements of the network that he thought were important.)

smacz · September 25, 2015, 6:02am

As I am mostly unfamiliar with TOR and it’s functionality, I would request some clarification on this statement if I may.

In light of the numerous reports on TOR vulnerabilities, in what ways and to what extent is the protocol weak in regards to traffic flow or other analyses?

Another question to consider is: What base assumptions are made of the TOR protocol that are not necessarily similar for the SAFE Network?

Tonda · September 25, 2015, 6:08am

Thank you so much. You are quick helpful and to the point. 1+ respect for smacz for real. I look forward to more of your scalpel like responses. I will definitely check out that speech. Thanx again.

19eddyjohn75 · September 25, 2015, 6:44am

This might also help, the FAQ 33 videos

zankfrappa · September 25, 2015, 7:20am

@smacz , here’s a good article from Tor itself which explains their weaknesses – https://blog.torproject.org/blog/one-cell-enough (though the latter half of the article gets in depth on one particular attack they’re partially debunking). Basically when you’re watching traffic end-to-end, even though the middle hops are a black box, if you see what goes in and out of that black box (even if it’s encrypted), you can make some good guesses as to who’s doing what on Tor.

Here’s a recent article on a new project which offers better anonymity but at quite a performance cost: 'Dissent,' a New Type of Security Tool, Could Markedly Improve Online Anonymity

One feature of Tor which I’m not sure SAFE has, is that traffic is intentionally sent through random hops. On SAFE it’s not quite random, it’s a DHT lookup.

I2P messages contain additional protections, such ability to specify delays on when messages should be sent or specify additional hops and routing instructions. I2P: A scalable framework for anonymous communication - I2P

janitor · September 25, 2015, 9:35am

The part I quoted was your claim that someone (a lesser thinker, of course) may come up with the same or similar question (months later, when they catch up with the foremost thinker of the forum).

So, for the benefits of the community you nominated your unique question for the FAQs.
True team player!

That’s not necessarily true. I can download top 10 anti-Communist videos to SAFE and read them to learn the patterns and flows (for example, how many chunks there are requested and how they are delivered). After a lot of learning it may be possible to narrow down the list of suspects to some large number that would have to be cross referenced with a bunch of other things from other sources.

Chunks: in videos and all compressed docs, if there are repeating deduplicated chunks, those are very likely the same files. Not that it matters in terms of files, really, since you cannot really see what’s inside as they are encrypted on way to the downloader.

It won’t be easy to detect SAFE users who apply simple measures of protection.

smacz · September 25, 2015, 10:20am

That article focuses on mainly what they describe as “tagging attacks”, so I’ll focus on those for this post. Feel free to expand the surface area if you wish. Also, as an aside, the other three attacks that they link to; (some of) their pdf’s are unavailable. I guess someone didn’t want to renew their DNS lease. Another problem defeated by the SAFE Network…

Sorry for the long post. Don’t worry though, I repeat myself for clarity often enough. I swear it started out a lot smaller. Also, once again, I am not an expert. YMMV

The way we generally explain it is that Tor can try to protect against traffic analysis, where an attacker tries to learn whom to investigate.

For future comments, I want to re-emphasize this point. This is not a discussion about an attacker trying to learn whom to investigate. It is exactly about the “other things”.

However, Tor can’t protect against traffic confirmation (also known as end-to-end correlation), where an attacker tries to confirm a hypothesis by monitoring the right locations in the network and then doing the math. -Emphasis mine

Before we dive in, (breaking my own rule above) I did want to mention that setting up such a tagging attack would be so utterly massive in it’s scope that it would be relatively unfeasible. This mainly due to the amount of farmers required to serve up any meaningful bit of data. But feasibility never stopped the NSA before.

So chunks are served by farmers. Those chunks aren’t even necessarily a unique piece of data. For instance, I don’t know what kind of file to exemplify here, but imagine a type of file that contained a huge (1MB+) header - even when compressed. If that header uses default values and many people store a file of that nature with different content but with the same header, that header chunk will be brought down for many different reasons, without the corresponding exit of that chunk on the target client.

Also, if that chunk is popular, it will be cached on an intermediate node. So now the “server” of that chunk has gone from 4-6 vaults to a ${probability} number of vaults - the probability depending on just how popular that chunk is. That also varies with time. One day a chunk may not be there, another it is. And then the next it’s not again.

On top of all that, think about churn. As nodes enter and exit the network, the same chunk of data is copied to multiple machines. It may be stored offline while a farmer is offline, but there are always 4 live chunks available somewhere. So when a chunk is reduplicated and stored on another vault, the attacker would then have to figure out which new one it went to before it can continue the analysis. That is if they don’t have to start it all over again.

Another (less convincing) part to this is that vaults are not serving specific content. They could be content for any number of applications. For instance, the original Silk Road servers were hacked. If they did this type of attack before hacking it, they could attempt to correlate the content that was put out by these particular servers.

Now what if the servers served several services?[1] They wouldn’t be able to tell which services were requested when - thus instilling plausible denyability. Also, since with the SAFE Network port numbers are randomized uniformly(?), there’s no way to say: “That came out of port 80 for a http request,” or “That came out of port 21, that must be an FTP request.” Data is just data. Nothing more, nothing less. Every single thing would have to be taken into account. Even checks from all of the other managers that are in charge of that particular vault.

What if the known servers only hosted part of the service? Then there’s a good chance that the known nodes would not be contacted for all of the services, and they would miss many correlations of that same service, just because it wasn’t being served by those known machines, even though the request from the client would be indistinguishable from one that would be eventually served by those servers. Enough of a chance, I think, to establish plausible deniability.

The basic idea is that an adversary who controls both the first (entry)
and last (exit) relay that Alice picks can modify the data flow at one
end of the circuit (“tag” it), and detect that modification at the other
end — thus bridging the circuit and confirming that it really is Alice
talking to Bob. This attack has some limitations compared to the above
attacks. First, it involves modifying data, which in most cases will
break the connection; so there’s a lot more risk that he’ll be noticed.
Second, the attack relies on the adversary actually controlling both
relays. The passive variants can be performed by an observer like an ISP
or a telco.

In general I’d say that these limitations hold, and are amplified to some extent. Especially with the difficulty of controlling, hell, even knowing which “relays” to control being orders of magnitude greater.

An interesting question here is: “If a successful GET triggers a reward to the farmer, how does the network know which farmer served up that data, and can that mechanism be exploited?” That is something that I have not researched yet. Anyone else care to expand upon that? If not I guess I’ll just keep digging (I got nothing but time anyways).

[1] How many services can a service server serve when a service server serves services? Several.

janitor · September 25, 2015, 11:36am

An interesting question here is: “If a successful GET triggers a reward to the farmer, how does the network know which farmer served up that data, and can that mechanism be exploited?”

The network knows because it picked one of the replica hosting farmers (vaults) to deliver.
Ways to exploit may be several, for example one can seed a lot of content and corelate his earnings with traffic observed on residents’ clients. As you watch a video, the number of gets may be the same as the number of MBs of sequentially downloaded content in given time frame on the watched client. This assumes no caching and more.

Again this too can be made harder by the client (download some random shit at the same time, etc) , and the same farmer won’t get all requests so it won’t be that easy.

This stuff concerns only a tiny minority (say 0.0001%) of users. There is virtually nothing that users cannot stealthily download today. Unless you are in Iran or N. Korea, why worry?
Governments cannot even bust “illegal” dark web sites on Tor (with 10-20K users), let alone find some “illegal” SAFE downloader hiding in a crowd of 5 million normal SAFE users.

By the time they get their act together and arrest first SAFE pedos, less illegal users will have to slightly improve their privacy to buy themselves another 2 years of worry free time. And does anyone expect SAFE devs will not keep improving the sw?

Topic		Replies	Views
All the encryption layers for SAFEnet Features	26	7839	August 15, 2016
NAT Traversal & Bootstrapping Development	128	7979	February 16, 2016
New Members: Start Here! Beginners	53	8854	July 4, 2019
A discussion on the Privacy of the SAFE Network Features privacy	20	3447	March 22, 2018
Is SAFE a decentralized server? Beginners	22	2669	March 21, 2017

SAFE versus passive surveilance

Related topics