Analysing the google attack

mav · January 4, 2018, 11:33pm

Google attack stats to control one section

The google attack test should report how much work it takes to control a single section. This post uses a test which:

establishes an honest network of a certain size and average age
then repeatedly
- add attacking vaults
- add 1 normal vault for every 10 attacking vaults
- remove 1 normal vault for every 10 attacking vaults
this attack pattern continues until the attacker controls a single section.
report how many attacking vaults are required to control a single section

This test better represents the target of the attacker to reach their motive (ie to control consensus). The test includes ageing and elders etc.

Results

Given an initial network size (col 1), how many vaults need to be added by an attacker until they control their first section (col 2-6)? And what percentage of the network does this represent (col 7, average of 2-6 as percent)?

Unexpectedly, the larger the network the smaller the proportion needed for success. Of course it’s still more total vaults to attack a larger network, but the proportion decreases as the network gets larger.

Netsize   Test 1       2       3       4       5 | Avg Percent
     1K     2599    1573    1964    2043    2340 | 67.4
    10K    11449   19945   15004   19601   11616 | 60.0
   100K   125776   98073  137955  103118   97975 | 52.7
     1M   893224  983631  864215  974953  724229 | 46.9
    10M        - 7026273 5017128 6996670 7921890 | 40.0

The simulation is deterministic so these tests are repeatable (see commit edf7aff). The simulations take optional flags for seed and netsize, eg $ ./google_attack -netsize=1000 -seed=2 should give 1573 attacking vaults (row 1 test 2)

One caveat is the ‘disallow rule’ preventing multiple vaults aged 1 had to be disabled because it prevents small networks from growing.

What does it mean?

These results should be taken with a large amount of skepticism since there’s so many unknown factors. And the ageing mechanism is still being fine tuned by the maidsafe team.

The answer to ‘how many vaults are required to control consensus’ can not be any more precise than ‘it depends how big the network is’. Even when the network size is known, it constantly changes throughout the attack so the amount of resources required is very hard to know in advance. The table above at least gives some idea of the magnitudes at play as the network grows.

I’m surprised how difficult it is to make correct assumptions about this attack; most of my intuitions about it have been shown incorrect by the simulations.

It always needs to be remembered that controlling a single section doesn’t necessarily give the attacker much benefit (as happybeing pointed out above).

Also from a marketing perspective, it may be desirable to use percentage figures from before the attack. So for row 1 test 1 in the table above, 2599/3599 = 72% is the proportion of attacking nodes on the network after the attack. But it’s more impressive and marketable to use the proportion of attacking nodes required before the attack, ie 2599/1000 = 260% of the network required to perform a google attack. That’s almost triple the current network size! 260% is much more impressive than 72%, even though it’s actually the same thing.

Impact of Ageing

A typical attacked section has an age distribution like the table below (in this case taken from netsize=100K seed=4), with detail for the elders (oldest 8 vaults).

Attacked Section Age Distribution

Age Attacker
7   false
5   true
5   true
5   false
5   true
5   false
5   true
5   true
5   false
5   false
5   false
Age Attackers NonAttackers
4   4         6
3   15        5
2   5         2
1   1         0

Resource bottlenecks

Yes this is an interesting question.

The limits on how many vaults can be thrown at a network… trying to wander through the variables… they must store data, but the more vaults there are the less data each vault has to store. They must supply bandwidth for chunks, so at some point that becomes a bottleneck depending on the size of their pipe. They must be timely in their responses, so latency and bandwidth both matter there, probably also cpu for signature generation and verification. Additional labour and skill to modify the vault code for coordination between the attacker vaults… All these things cost money so ultimately budget will limit the total number of possible vaults any one entity can run and the duration they can run for. Maybe I missed some aspects?

Amended thoughts on google attack

I don’t think it’s possible to fully model a google attack because it depends on human behaviours of non-attacking participants (which are difficult to predict and model). Despite the imperfections there can be some attempt by big farmers to categorise themselves as a potential attacker vs merely a large-scale participant. Bystanders will probably only know of a google attack after it happens.

A successul google attack seems pretty far-fetched to me… but bitcoin mining ended up more centralized than people first imagined so I’m not too keen on making predictions!

Where to from here?

I’d like to look into the difficulty of an attacker controlling all copies of a chunk, ie not just controlling a single section but also controlling sections with redundant copies of a chunk (see RFC-0023 Naming Immutable Data Types although routing as currently coded only keeps chunks in a single section so I’m not sure what the plan here is). However, if there was some sort of ‘cache scavenging’ mechanism to recover chunks from temporary caches anywhere on the network, this would negate the attack.

Data loss seems to be Peter Todd’s main concern of a google attack, so for that to happen all copies of the chunk must be controlled by the attacker (which the simulation in this post models as currently implemented, but not as currently designed, pending implementation of backup and possibly also sacrificial chunks).

neo · January 4, 2018, 11:45pm

Excellent work again @mav and seems to be a more realistic simulation.

Considering work is being done on how nodes are added, (due to your finding the bugs) then the dev team can use these simulations to assist in making it even harder for an attacker.

I think that is enough to start with

Once the network is of a decent size (1M nodes) then the attacker has to have very large resources indeed to get control of one section.

And I’d agree with your amended thoughts.

dirvine · January 5, 2018, 5:27am

@mav you do really great work, I like you are at the edge of the team and independent. That is important for sure, but please do post a btc (maid) address here. I think we need to be thanking you at least, but probably do a bit more than that. In any case I cannot thank you enough, these snippets are hugely helpful and extremely honest (brilliant). I am sure @nicklambert will sort out a few coins for you now though.

In terms of the post, I have not gone though it all yet in too much detail (working on a niggle/itch I have in our algorithm for secured off line chain, not a worry but there is an interesting take on it). Anyway I wonder if you can run this test (and also if @bart can do likewise in the sim) with infants created with age 4 and again restricted to one per section (all restrictions like this only happen when there is a complete group (i.e. group size elders all of age > 4)? Just for a small test if that is OK?

Thanks again chap, SAFE needs folk like you, the whole community is better for it.

tfa · January 5, 2018, 9:15am

Before adding this attack, could you just publish a working version of the simulation? To understand what I mean see this post.

digipl · January 5, 2018, 9:30am

Wonderful work, as usual. The numbers agree quite well with my rough mental calculations and reaffirm me that an attack of this kind is in practice, except in a small network, unrealizable.

In this type of attack there is also a fundamental factor that is the network effect created by the attacker himself. Malignant nodes need time to reach the necessary age, which is helping the network to grow and attract new users. The attacker enters a vicious circle that can end up helping the network without managing to control a group.

My impression is that more than a brute force attack we should consider the possibility of targeted attacks mixing a significant number of malignant nodes and waiting for certain matches to perform DDOS attacks against nodes of a specific group.

An exciting world …

mav · January 5, 2018, 10:20am

I appreciate this but my approach to donations is outlined here: https://iancoleman.io/no-donations/

Since my projects are often the efforts of many people, most of which don’t appear in the obvious places like code or issues, donating to projects I work on poses significant operational difficulties.

In this case the ‘many people’ are you and the amazing maidsafe team who have given everyone such a fascinating project to explore, and I feel unable to accept donations when so many people are deserving.

I’ll look into this when I next get a chance. Does ‘infants created with age 4’ mean initializing vaults with age 4 instead of age 1? Maybe some description of the intention of the test will help clarify what this means.

There are only a couple of changes that should ‘fix’ the simulation, but these are not yet formalised in any docs anywhere

relocate oldest possible instead of youngest possible, see choose_for_relocation
temporarily disable (or modify) the disallow rule section.rs L198

I know this is just an indulgence, but could you outline your approach to these rough mental calculations? I’m always curious about the way people approach Fermi Problems. It’s a great skill to have.

I don’t think targeting specific sections is possible since a) vaults are relocated to a ‘random’ part of the network when they join and b) vaults must relocate at least 3 times to become an adult, and often much more than that to become an elder. The relocations are essentially random and cannot be targeted.

However, relocations are always to neighbouring sections so this does allow some better degree of targeting than purely random relocation. I’m not sure how much of an improvement it adds to the attack. Very interesting idea though!

tfa · January 5, 2018, 10:36am

Thanks, I will try that.

The simulation already does this because nodes_by_age returns nodes by descending age. The name is misleading, this function should be renamed in something like nodes_by_age_descending.

happybeing · January 5, 2018, 10:40am

@mav Warms my heart to see this and the responses Ian. Well done and thank you man… again!

Viv · January 5, 2018, 11:34am

this is an interesting point too and one that comes up often in discussions as to what should be valid age triggers to allow just a gradual network growth rather than requiring churn to facilitate growth apart from startup conditions of requiring a complete grp and so on before this rule comes into play. Infants starting at age 4 idea was exactly new nodes starting at age 4 rather than 1, thereby not being relocated immediately at next churn event in joining a new grp and so on.

Data from this is very useful indeed and with the simulations can certainly help the guys structure the requirements here accordingly. Huge thanks for that I’m sure you’ll be hearing from them in this thread once they’re back next week (few of the guys still on their end of the year holidays)

dirvine · January 5, 2018, 11:37am

Yes the thought is to still have short lived peers not cause hassle (we don’t record them, if they go off line we just delete their id etc.). Beginning at age 4 just means they need to hang around for approx 2^4 churn events (it will be more than that due to oldest first relocate). So we should still be ignoring people who just try a vault for a short period and give up (or try to mess with the network via switching off/on etc.) but do less work as a section by relocating the infants at all.

Hope this helps, thanks again Ian, love the donations approach BTW, very admirable, I am going to copy that approach myself. Makes a lot of sense.

nicklambert · January 5, 2018, 1:43pm

Hey @mav. Thanks for all your efforts. I’ll get some BTC donations sent over to the EFF and FSF when we can do so in relative safety (re Meltdown and Spectre). While personal monetary donations are not for you (kudos for that) we would be happy to get some liquid donations your way. If this is of interest and your comfortable divulging your address please DM me. Cheers!

Jabba · January 5, 2018, 6:09pm

Ditto! You’ve earned an enormous amount of respect since you joined the community @mav. I hope I get to shake your hand and buy you a drink some day.

digipl · January 6, 2018, 11:19am

Great Skill??? Only too much time…

My logic tells me that:

.-If an attacker has the resources, and patience, to generate,for a certain time, more vaults than the sum of the entire network will end, inevitably, controlling a group. Node ageing delays the process but, in a permissionless network, it can not avoid it.

.-The proportion to control a group decreases, if the network is larger, because the attacker increases the chances of achieving it. The average percent begins with the consensus ratio and will be decreasing, as the network grows, until get near, but below, TotalVaults/2 or 50% with the actual parameters.

.-So the key is the consensus ratio but increasing this ratio implies lower network speed, more messages between nodes, more possibility of not reaching consensus and other atacks. Difficult choice…

This rule always worry me.
How can the network grow with this rule?
Will not it also generate enormous frustration to new farmers who can be stopped for a long time, without being able to work for the network?

I join the praises to your work. It is a pleasure that someone with such talent help in this development.

neo · January 6, 2018, 11:46am

I think this is a misreading of what the figures were saying.

To control the first (just one) section, the attacker has to have more resources if the network is larger.

For a small network (1K) the attacker only needs around 2500 nodes.

As the network grows the attacker needs massively more attacking nodes to and at 1 million nodes the attacker needs 893224 Nodes.

So the larger the network the larger the number of attacking nodes needed to gain control of the first (just one) section.

While the proportion is interesting and shows a reducing %age we still see the number of attacking nodes becoming too many for any one entity on this planet when the network goes global with millions and millions of nodes.

And of course this is an area where the dev team can make it even harder for an attacker to get a foot hold with a network past the testing stage.

digipl · January 6, 2018, 12:21pm

My English is, probably, worse than I thought.

In the post I am talking only about proportions, not global numbers. Of course, if the network is larger, you need many more nodes but the percentage decreases from 67.4%, in a network of 10K nodes, to 46.9% in a network of 1M nodes.

These numbers match my calculations where an attacker in a large network will need less than TotalVaults/2 to achieve his goal.

Although, as I said before, these calculations do not take into account the effect of attracting new users that the attacker generates, which is why I have always doubted their effectiveness.

mav · January 7, 2018, 1:01am

The change for this set of tests is to set initial vault age to 4 instead of 1. See branch age_4_new_vaults, particularly commit 0cb051a ‘Initialize age of vault to 4 instead of 1’.

Control Test - Age Distribution

This is with initial age 1, for comparison with the following tests.

100K vaults (500K joins interleaved with 400K departures)

No disallow rule

How Many Vaults Of Each Age?

age vaults
  1 2168
  2 11760
  3 39989
  4 23706
  5 10920
  6 5524
  7 3002
  8 1565
  9 778
 10 409
 11 142
 12 32
 13 4
 14 1

100000 total vaults
1584 total sections

Test 1 - Age Distribution

Vaults start at age 4 instead of age 1

100K vaults (500K joins interleaved with 400K departures)

No disallow rule

How Many Vaults Of Each Age?

age vaults
  4 93026
  5 3445
  6 1693
  7 914
  8 507
  9 258
 10 115
 11 33
 12 8
 14 1
   
100000 total vaults
512 total sections

Test 2 - Age Distribution with disallow rule

Same as above but with only one vault aged 4 allowed at a time, any other vaults aged 4 trying to join are disallowed

How Many Vaults Of Each Age?

age vaults
  4 10
  5 1

11 total vaults
1 total sections

Test 3 - Google attack

How many vaults does an attacker need to add to obtain control of a single section?

100K vaults (500K joins interleaved with 400K departures), then 10 attacking vaults join + 1 normal vault joins + 1 normal vault departs until a section is attacked.

No disallow rule

StartAge NetSize  Test 1       2       3       4       5 | Avg Pct
       1   100K   125776   98073  137955  103118   97975 | 52.7
       4   100K   115985   83910   94416  100508  119250 | 50.5

Tentative Interpretation

My thoughts on changing the starting age to 4…

The disallow rule still prevents the growth of small networks. In reality it can be ‘massaged away’ by vaults choosing to depart these ‘stagnant’ sections and thus possibly initiating churn events. But the rule means any section in the disallow state (no matter the network size) requires a vault depart if ageing progress is to be made (possibly many departures). Disallow rule is not an effective strategy in my opinion, creating deadlocks for no benefit.

Maybe I don’t understand it clearly yet.

Changing the minimum age to 4 discards a lot of network events (network events trigger an opportunity for a vault to relocate and thus increase in age):

Event ending in | % of all events
       ......10 | 25
       .....100 | 12.5
       ....1000 |  6.25
       ...10000 |  3.125
       ..100000 |  1.0625

So discarding events less than ...10000 is 43.75% of all events (plus another 50% of events which end in 1, so 93.75% of events are discarded instead of just ~50%). This will slow down the rate of ageing. I can’t decide yet whether this is a good thing or not.

The rate of relocating is unpredictable since it depends on human behaviour to trigger network events. But the section desires a certain rate of relocating (not too fast since it adds work to the section to bring new vaults up to date, but not too slow since this affects security). The initial age impacts the rate of relocation, but I think age is the wrong abstraction to use for this mechanism.

For example, the rate of vault relocations could be controlled by varying the initial age of vaults. A section that wants to slow down the rate of relocation could have a higher initial age, and if that section that wants to speed up the rate of relocation it could change to a lower initial age. But this is a poor abstraction, since age is meant to represent ‘credibility’, not ‘relocation rate’.

So with that in mind, maybe age needs to be only used for authority (ie determining elders), and a different mechanism used by the section to determine relocation rate. I feel that age is responsible for too many things and it leads to a very messy / difficult-to-tweak-for-security-and-economics algorithm.

I think the two concepts of ‘credibility’ and ‘relocation rate’ can possibly be put together in the ‘age variable’ once the design is complete, but I find it hard to think about the design if I mix ‘security via authority via eldership’ and ‘security via nontargeting via relocation’ in the same mechanism.

The concepts at play are still not fully crystallised in my mind so here’s my brain dump of concepts floating about

difficulty of joining vs growing vs gaining authority vs consensus - it should be easy to join, easy to grow, hard to gain authority, extremely hard to control consensus.
security vs authority vs consensus - when is authority and security the same? when is it different? vaults must be empowered but not control consensus.
‘section work’ vs ‘vault work’ - how much work and by who, and how does this work affect security and authority? Vaults should do work, but the section as a whole should not. Scaling up the amount of network activity (ie end-user requests) should affect the work of a vault O(1) much more than the work of a section O(n).
dynamic supply and dynamic demand of resources and how the network structure affects this.
network performance as network size increases (eg file upload speed) and whether a network can be ‘too large’, and how ageing and disallowing affects this in both good and bad ways. I guess this is ‘elasticity’.
incentives for various classes of vaults, eg low-age vaults have different incentives to elders and thus will behave differently.
rewards for participation and how age / class affects this, and how the farm rate interacts indirectly with ageing / disallowing.
responsibility hierarchy; why do different vaults have different responsibilities? Is a hierarchy the best way to achieve this?

The google attack result is interesting, making it slightly easier to attack when the initial age is 4. Still a strong result but it would be interesting to further investigate why the attack becomes easier. I’m guessing it’s because there’s less diversity of ages so attackers move into the upper age brackets faster and thus have more chances to win the tiebreaker and become an elder.

I’m sure maidsafe have some solid thinking around this but I’m pretty happy with it being behind-the-scenes. I’m trying to be careful because I know more talk (ie from me) on the topic does not necessarily help with the goal to improve and finalize the algorithm. I just find it fascinating so can’t help exploring it.

I’ll choose to interpret this as liquid = beer! This works well for me, I’ll gladly have a beer whenever we find ourselves in the same part of the world

This is a concern of mine too. The lure of ‘just get involved’ is a powerful one, but if this is taken away I feel the project falls behind on the overall goals.

If I have data on the network and I want it to stay safe, I must be able to contribute to the safety of that and not have it left entirely in other hands. If I can’t start a vault (because resource requirement is too high) I have no way to protect my data other than hope others will do it. I admit this is quite an idealistic / unrealistic approach but it’s important to me and I’m sure to many.

dirvine · January 7, 2018, 2:12am

Starting with this comment, please feel part of this process, you are and its really great to have people outside the office looking at this. It is very refreshing. Now, thank you for testing the initial age thing here, nice to see this.

Interesting thought.

Just on this, as you say churn is a human affected thing in the main. What we can use though is the fact that that as age increases then that peer will be 50% less likely to churn. Just a note.

Absolutely

Yes this does need more thought. Separation of concerns is always good, we are possibly coupling here and should not be.

It may be worth sharing some thoughts with you here. Here are some points, in no order really.

Age is a measure of increasing work done by a peer.
Work done should be something that has taken place over time and not able to be done faster by a larger peer etc. (no faster cpu/hashing etc.).
Older peers are more trusted than younger ones
A peer that has been doing work for X is very likely to be able to do 2X (this is about the only thing that came out of a study into gnutella IIRC). The point was on line time probability increases the longer a node is on line, so a node on for 1 hour likely to be on for another 1 hour period and so on
Age uses a mod division to work out effort instead of counting events. Agreeing a count is to hard in SAFE type networks as its just extra state, so instead we use mod.
Initially age was going to use the hash of hop messages through the section, but we considered that could be gamed by a bad actor sending messages that would give a hash to relocate their own nodes. Not as simple as it sounds to do that, but it was considered a risk none the less.
Having infants not count as churn events, but using node lost or similar slows down ageing, but also prevents targeting to a great extent.

Looking at 6. again is probably something we should do, but ensure there are no targeting capabilities introduced.
Anyway I am dumping some thoughts here, no particular reason, just to help you see some thinking so far. I will try and do more of this in this thread where possible.

I don’t feel that making relocation orthogonal to churn events is a hard problem, just some subtitles around it. I will try and give this a bit more thought tonight. I have a load of paper here on another subject I will clear up first though, but I will get back to this later or tomorrow for sure.

Thanks again Ian, don’t worry about chatting publicly about all of these points, its the very best way to go most of the time. We do keep some stuff until we have good results, but that is because Engineering discussions can appear negative to non engineers, its just the logic sounds tough at times, but I think this community are strong enough for all of that. Its great for everyone to see the small things that matter a lot like orthogonality of code modules and algorithms/ I hate coupling so this is indeed a good discussion and very valid points about age/churn being coupled and trying to do too much. We can unweave that part quite easily with just a little more poking.

Cheers Ian, nice one again.

tfa · January 7, 2018, 1:56pm

I tried the four possible combinations with:

youngest/oldest priority for relocation
allow/disallow more than one node aged 1 per section

In all cases the initial section never splits. Here are the results with 3000 iterations:

Oldest first + disallow more than one node aged 1:

Age:	1	2	3	4	5	6	7	8
Count:	1	15	3	4	3	2	1	2

Oldest first + allow more than one node aged 1:

Age:	1	2	3	4	5	6	7	8
Count:	2446	35	14	10	5	1	2	1

Youngest first:

Age:	1	2	3
Count:	5	2428	105

(last result is independent on whether or not more than one node aged 1 is allowed)

I also tried many other things:

enlarge the age range when choosing the relocated node
allow a larger number of aged 1 nodes in a section
instead of choosing one of the youngest or oldest nodes, choose a node in the most common age (to balance the age distribution in a section). It didn’t work but this an idea to keep in mind.
modify the drop probability function (I know this is an external factor, outside of the vaults control, but this was my last desperate attempt).

But in all cases the simulation never takes off and I am beginning to suspect a bug in Maidsafe simulation. As he has rewritten the simulation from scratch with another programming language (Go), @mav doesn’t have the problem.

@mav, what you do is a very important because it will allow to confirm results generated from the same set of rules but programmed by 2 independent people/team, using different algorithms and different languages. When Maidsafe makes its simulation work and if the results are similar then we can be confident on the future network behavior.

ktorn · January 7, 2018, 5:53pm

Great work @mav

Sounds like a great use case for adding a VDP file listing your preferred project dependencies

The current PoC implementation (python) won’t be useful yet (bitcoin fees too high) but I’m planning to release another one (written in Rust) later this year that will process payments in LN/BTC, and maybe BCH or LTC (not decided yet).

dirvine · January 7, 2018, 6:57pm

I am not sure why, in this case, but perhaps you are applying the rules for a complete group when there is none. So if there is no group with all elders > 4 then it is not complete, we allow any number of infants etc at that time. It could be your code si being to aggressive and applying the disallow etc. before a group is formed?

Just a thought, I am a bit flu ridden right now so accept errors

Not sure, @bart is off this week (mostly) but back tomorrow, he can confirm, but I can tell you the simulation is not released by us at all yet and is undergoing big changes with the design meetings as we move along. SO I would not be surprised if there is unpublished parts of the code there. It will all be fine soon, but mid test/impl we will make a ton of tweaks and it will inconvenience you a bit (sorry). I am sure there is a branch or off line code you need anyway.

Topic		Replies	Views
Group Size security calculator Blog Posts	14	4278	September 19, 2017
Consensus 28/32 or 8/8? Development	169	8801	June 19, 2018
Churn, number of copies, and perhaps an idea Features	19	1860	May 12, 2015
Network growth monitoring Features	1	636	January 18, 2016
Data Density Attack Development	77	5184	February 17, 2018

Analysing the google attack

Google attack stats to control one section

Results

What does it mean?

Impact of Ageing

Resource bottlenecks

Amended thoughts on google attack

Where to from here?

Control Test - Age Distribution

Test 1 - Age Distribution

Test 2 - Age Distribution with disallow rule

Test 3 - Google attack

Tentative Interpretation

Oldest first + disallow more than one node aged 1:

Oldest first + allow more than one node aged 1:

Youngest first:

Related topics