I hadn’t seen the Primer. At a quick glance, it’s better than what I saw before… but it’s far from ideal.
Having a “complex project” argues against making the code the priority. You can “just code” a simple project. On a complex project, you have to understand how everything fits together, and you have to be sure that everybody working on it shares that understanding.
It’s not all about me understanding it. It’s also about me, and others, having reason to believe that you undertand it.
When I read SAFE Network documents, I tend to worry that a lot of decisions were made because, to put it in an extreme and over-provocative way, “We can’t think of an attack that breaks this in 5 minutes, so it must be impregnable, and anyway we have code to write”.
There’s so little argument given for why most of the claimed properties actually hold… but the more complexity you have in the system, the less likely it is that they do in fact hold.
I keep getting feelings that Sybil attacks and exit scams are lurking… but nothing ever mentions them or how they’re truly prevented. I see a lot of talk about anonymity and information hiding, but nothing of the form “the adversary can’t infer X from Y because some clear reason”, or “we assume that players A, B, and C aren’t colluding”.
I glanced over the white paper on the consensus protocol when it came out, and I saw a lot of complexity, an overwhelming amount of mechanistic detail, and not a lot of space spent on specific explanations of what attacks it resisted, what attacks it fell to, or why. Not even a list of in-scope and out-of-scope adversaries or attacks.
Maybe there’s really no way to fix this, but in case it’s useful, here are some random specific issues things that jumped to mind as I skimmed (much of) the Primer. Maybe there are answers to all of these. Probably there are answers to most of them. But I’m not sure that “the project is too complicated” reassures me about not having explanations.
-
Seniority seems like a very bad way to assign trust… and there’s no clear explanation of why trust is necessary or what trust is granted. What can a voting node do that makes it necessary to have it trustworthy? What options were considered for limiting trust? I see that the network uses measurements like “total this or that type of chunk count”, and security relies on them. I assume the voting has something to do with setting those. Which means it’s a serious issue and deserves a better answer than “we trust older nodes because”.
-
Sections seem awfully small. On the other hand, there doesn’t seem to be any explanation of how even the small sections guard against, for example, partition attacks. Or of what happens with partition attacks further up the hierarchy, for that matter. So how was the section size chosen, and what does happen if somebody tries to partition things? And the split-merge system must be incredibly complicated and attackable… where’s the detailed analysis that shows it will survive adversarial behavior?
-
The proxy system really has the feel of something thrown in without a lot of analysis. Even a three-layer proxy system like Tor has some real vulnerabilities. What’s the adversary model here? What are the risks? What other mitigations were considered?
-
We’re told “The group of Vaults to which the user is connected might know a little about what the user is doing on the network…”. What, exactly, do they know, and what can they do with it? Has anybody thought hard, from an adversarial perspective, about what the exposures are here? Have they written down those thoughts, so that some later change doesn’t violate an important assumption? Whatever information is disclosed to vaults, what’s the reason they need to know it?
-
Followed by “but they can only identify the user by their XOR address and not their IP. In this way, complete anonymity is assured.” Boy, does that get my spider sense tingling. I’ve been sometimes watching, sometimes actively working on, Internet anonymity since 2000… and I would never write a phrase like “complete anonymity is assured” without backing it up with a mathematical, or at least quasi-mathematical, proof.
For example, what underlies the assumption that nobody can discover bindings between IP addresses and “XOR addresses”? Or between “XOR addresses” and other identifying information, perhaps in the actual data the user exchanges with the network? What does “anonymity” even mean here? The properties you’re really looking for are of the form "Alice can’t tie information X about Bob to information Y about Bob (where X or Y may or may not be the name “Bob”, and the connection between X and Y may be made by inference through some intermediate information). Well, what X’s and Y’s are we talking about?
-
The next part about “Multilayered encryption” is also kind of worrying. “Several extra layers are active when people use direct messaging or create a public profile”. Well, great… but what is each of those layers for? Throwing in more layers doesn’t help unless you understand what value each of them brings and how they interact.
-
In the same vein, “The network is meant to be as ‘zero knowledge’ as possible” is scary. Either you’re zero knowledge under some set of identified assumptions, or you’re not. And if you make a sweeping statement like “Farmers cannot possibly figure out what chunks from which file they are storing”, then you have to provide an argument for why that’s true. “Cannot possibly” is an incredibly extreme claim, and making it without proof sounds like snake oil.
-
There’s no real explanation of how farming works or how it’s protected from any particular attack or class of attacks. And when I go try to read the Safecoin RFC, it assumes that I know about a ton of details about entities that aren’t even mentioned in the primer… and on a skim it seems incomplete even if I did know them. And it looks like Safecoin is just an account with a “Client Manager”, which is presumably trusted because it’s an old node or old nodes
-
Centralized, hardcoded bootstrap nodes are a point of attack, but could of course be fixed easily.
… and on maybe less adversary-oriented issues…
-
The chunking protocol is pretty basic. Why was it chosen? What problem does chunking solve here? What does “most likely” really mean in “most likely on machines distributed around the world”? How do you assure that that’s “likely”? How are the machines chosen? Why chunking and not some other random approach, say fountain codes or whatever? And what does the “self encryption” protect against, anyway?
-
There’s no explanation of how any of this interacts with the “SAFE Network fundamental” that all data are immutable and undeleteable. And that’s especially of interest because I assume that particular rule is actually aspirational or metaphorical. You physically can’t keep everything for ever and there’s no real reason to want to.