Data Density Attack
There’s an attack that is expensive but may cause serious and permanent problems to the network so it’s worth discussing and seeing what it’s about.
Goal
The goal of the attacker is to cause significant disruption to the network. There is no direct gain to the attacker. This makes the threat hopefully not too high.
Description
The crux of the problem is data on the network is expected to be evenly balanced across vault names and data names. Network operations like joining and relocating have this balanced-vaults and balanced-data assumption built in (well, kind of!).
If an attacker is uploading single chunks at a time (ie many 1MB files) they can choose the content of those files / chunks to result in names targeted to a specific part of the network.
This means that one part of the network will have much more data than the rest of the network and the vaults in that part of the network will have much more work to do than other vaults.
This data imbalance is probably ok (not ideal but still tolerable). But if there is a merge between the data-heavy section and a ‘normal’ section, the ‘normal’ vaults need a lot of spare space to be able to facilitate the merge.
Any vaults without enough resources to merge will be booted off the network, meaning a high chance of cascading merges.
Furthermore, this presents a problem to the current understanding of ‘balance’ on the network. Difference in section prefix length is assumed to be unhealthy, but I think difference in data density is even less healthy. It’s probably normal for prefixes to become uneven because there will be times where data is unevenly distributed, so the structure of the network needs to compensate for that.
If one part of the network is very heavy on data because it was subject to a data density attack, then it makes sense for there to be more vaults in that part of the network to manage the load. This naturally leads to a difference in section prefix length.
My proposal is to remove ‘similarity of prefix length’ as a metric for network health and replace it with ‘similarity of data density’ per vault. The reason for desiring similarity in data density is to avoid merges of uneven data loads that may cause vaults to be kicked off the network and possibly cascade the merge.
This may have an impact on the current design work for relocating / joining / disallowing / killing vaults.
Expense
How expensive is this attack? It’s pretty expensive since the imbalance of data needs to be significant compared to all other data entering the network. The more ‘normal’ data there is stored on the network the more data the attacker requires to build up a significant imbalance.
It may take a lot of time and computational energy to generate chunks with a specific prefix. This effort becomes exponentially more difficult as the network grows (or the attack made more targeted), so smaller networks are more at risk than larger networks.
It costs safecoin to store those chunks on the network, but this should ideally be a decreasing cost as the network grows.
Ideally the attacker would also have vaults in the right part of the network to trigger a merge, which is very difficult to achieve (akin to the google attack, but easier since sibling sections are still meaningful to be part of).
The effect of the attack lasts a long time after the attack stops since chunks remain permanently. Eventually the effect is diluted by the additional upload of broadly named data, but this takes time.
The effect, if well enough targeted, would not be diluted by splits, and may actually be made worse in some cases.
Ideas
The concept of Balanced Network should aim for Resource Balance rather than the more abstract Naming Balance (I say this in the context of maidsafe/datachain_sim which measures MaxPrefixLenDiff). In this attack it specifically considers Storage Balance, but I think it probably also applies to Bandwidth Balance and Latency Balance etc.
The assumption that data will be evenly distributed and thus the assumption that vault names should be evenly distributed could be quite damaging.
This attack has impacts on the network understanding of spare space and requirements for how much spare space is needed before enacting incentives to bring more capacity online. Ideally the spare capacity is even across the network, but there may be times where capacity shouldn’t be evenly distributed.
This attack would indicate that random relocation is not good, and targeted / assisted relocation is necessary. I still think random relocation is overall preferable, but this attack is another point to add to the argument against it.
Concluding Thoughts
I’m not especially concerned about this attack in the real world, but cascading merges seem to be a big risk to overall network health, and it does seem to have implications for some of the current design philosophy being developed.