Data Density Attack

Your suggestion should certainly help make a data-density attack harder. However, if the protocol used by the vaults to decide on the final name of a given chunk is fairly deterministic, an attacker can just accommodate that when creating chunks and still be able to target a single section.

I’m also not quite clear on how this suggestion would thwart the farming reward attack. Unless we try and hide the real chunk name (and so also the contents for ImmutableData since we can easily deduce the name from the contents) from the vaults storing chunks, wouldn’t it still be easy to generate GETs for all the data your vault stores?

2 Likes

That’s not easy because the condition is dynamic and only known by the data managers (invariant wasn’t an appropriate name). Certainly much harder than an offline generation.

My proposal is the following:

  • client asks for ID1

  • vault managing ID1 stores the corresponding ID2 and ask for it

  • vault managing ID2 returns the data (possibly directly to client)

The second vault gets the reward on ID2 but doesn’t know that he has to issue a get on ID1 to fetch the data.

The 2 vaults should be far enough so that in case of merge the 2 ids don’t risk to be stored in the same vault.

2 Likes

What I understood back then is that the chunk ID stored locally would be different(a hash of…) than the real ID that represent an address in XOR space. It was the data manager that was responsible for mapping both together. The local farming computer had no idea which real chunk ID it contained. So you couldn’t just spam the network with GET request to exploit the reward system since the ID you would see wouldn’t be the real one.

I don’t know if that is still the case or relevant anymore.

5 Likes

We’re definitely at risk of going a bit off topic! Anyway, to make sure I understand properly, let’s say the client starts by wanting to store the string Laphroaig as an ImmutableData chunk. (yes, it’s whisky o’ clock here! :smile:) The name of this chunk (ID1) is the SHA256 of Laphroaig which is e85...

When the vaults covering e85.. receive the PUT request, they generate some psuedo-random new ID2 (let’s say 6ff..) for this chunk and they keep a record e85.. → 6ff... They forward the actual chunk over to the vaults at 6ff.. who then store Laphroaig under the key 6ff..

However, the vaults storing the data - the ones which will be rewarded via farming - can regenerate ID1 by hashing Laphroaig. If we want to avoid this, I imagine we’d need to do something like having the first vaults encrypt the data before sending it on to the second ones. But that means we have to pass the data back through these first vaults on a GET so it can be decrypted again for the client.

As for the data-density attack, if we can find a way to make the generation of ID2 unguessable by the client, even if it has a single malicious colluding vault in that section, then I think we’ve got a fix for the attack. I’m just not sure myself that there is such a way. We can certainly make it harder for an attacker, maybe so that only a fraction of his PUTs actually end up in the targeted section, and that along with the safecoin cost might be enough to deter the attacker. Definitely worth further investigation though; maybe there’s a decent approach here I’m missing!

3 Likes

If there was delayed creation of ID2 I think it could be achieved; the chunk is stored ‘normally’ using ID1 until a new block in the data chain is found for that section. Since the content of that block cannot be known in advance it can be used as a secure random number source. The randomness is used to derive a verifiable-but-random ID2 which can be where the chunk is finally stored. Both sections for ID1 and ID2 can verify the move is valid using the latest block.

7 Likes

Nice idea! Using the next block rather than the current should make things as good as impossible for an attacker to predict or influence. There are a couple of drawbacks though I guess.

It’s a shame to have to build in this indirection for every chunk stored since it comes with some overhead (latency in chunk ops, bandwidth, code complexity). Most blocks wouldn’t need this if most clients are honest. If we only reserve this mechanism for when a section is getting swamped, then it might be simpler to just take an entire tranche of blocks within a single address range and pass those to another random section.

That probably wouldn’t require the random relocation target to be hidden from the attacker since as long as the same target isn’t chosen every time, the attacker’s clumps of chunks will get dispersed evenly across all the sections. That means we wouldn’t have to delay the transfer until a new block is added to the chain, we could start the transfer as soon as it’s needed.

The other less severe drawback I can see is that it could be quite a large pile of chunks that get relocated when a churn event (i.e. new block) happens. Given that a section could already be working relatively hard to accomodate the churn event, it’d be good if we could avoid adding to that workload.

Having said all that, I still think using the next block as a source of random data is a great idea.

7 Likes

Seems like it would be impossible to launch such an attack against the datachain prior-chunk indirection fix you describe. However, it also makes me wonder if just using SHA3-512 hash would provide such a significant increase in difficulty to the attacker that double SHA256 indirection would be unnecessary.
I guess you could argue that eventually a single SHA3-512 could be just as susceptible to the density attack though…

1 Like

An address book is an index - really

I am not talking domain names here.

But I am talking of database, token, and any other sort of APP that wants to KNOW the address of the dataobject without referencing an index (address book)

  • database - has its optimised indexing structure. Your “address book” then means another indexing layer on top of that making the database 3 to 10 times slower.
  • tokens - Instead of using the token number to access the token the APP now has to index into the “address book” (primary index) to get the token’s actual address. The token may not be produced in order so this “address book” now has to be indexed, or shuffled each time a new random token is generated
  • Other APP that deterministically determines address - Now has to maintain an “Address Book” (=== INDEX) in order to access the data.

This means that all those programs (ie most) will now need an extra layer of indexing/shuffling in order to process MDs as is intuitive to do. This requires extra MDs to store the index (address book), extra processing in the network and extra cost for EVERY App that wants to access specific (numeric/binary) addresses.

MDs are MORE than domain names

Exactly. Hide the process completely from the APPlication and then this allows for deterministically determining the MD address (numeric/binary)

ONLY if the user request time deletion. Of course the network does not use time does it :slight_smile: You were seeing if we actually read your posts weren’t you :slight_smile:

As to @tfa’s idea of indirection. The indirection is only needed if the section is being loaded down. So while its an extra part of the code you could do this

  • if section is fine storage wise then simply store the MD as normal.
  • If the section is getting loaded down (certain %age) then set indirection and pass off the MD (or chunk) to be stored elsewhere
  • record the indirection link. And when user request s the MD (Or chunk) then the first section is queried and if it holds the MD/chunk then simply return it. Otherwise it passes the request to the section that now holds it.
  • It is possible to have the indirection recursive (with a limit obviously)
  • If this causes a recursive “loop” (hits limit) then the MD is not storeable at this time

This allows for dynamic control where the indirection only occurs in situations where the section is loaded down.

REMEMBER that when a section’s spare space is low the cost of “PUTS” automatically rises so in a loaded down situation the cost will rise anyhow (current model) when storing in that section

@mav I am wondering if the fact that the storing cost is controlled by the spare space that this attack would end up seeing the attacker eventually being charged one coin for each PUT when the spare space is quite low. And this is under the current model for charging and without any suggestions above.

6 Likes

Or, the self encryptor could generate a random ID1 instead of deriving it from the content. This way vault 2 cannot guess ID1 and the data could be returned directly from vault 2 to client, without transiting on vault 1. Probably a checksum should be also generated to check that data hasn’t been tempered with.

self encryptor runs on your PC and is opensourced so it can be manipulated

1 Like

While I’m often guilty of this myself, seems like the dynamic double indirection approach, although effective, would veer away from KISS rather significantly.

Seems like this hits the nail on the head. Safecoin to the rescue. Just for fun I’m now going to give another rally cry in the hope that someone else will jump on the SHA3-512 bandwagon, or convince me to jump off it, or push me off it… :grin:

3 Likes

Not sure significantly.

The loaded section simply generates a new address based on whatever (datachain block hash?) and sends the MD/chunk off to the new section as any other MD/Chunk is. So the code is one subroutine/function and send if the section is loaded. So still very simple just an extra step.

Obviously the request now has a indirection counter for detecting too many indirections.

Yes, sometimes we get caught up in trying to figure great ways to prevent attacks and the simpler answer is the back pocket nerve.

2 Likes

Busted again! :smile: OK - I’m definitely not pursuing this here - that’d certainly drag the thread way off course!

+1

The key difference to @adam’s suggestion above is that he’s talking about the amount stored rather than free space available. The free space isn’t as good an indicator of malicious storing as it could equally be a result of a random cluster of particularly high-capacity vaults. Also, the amount stored can be more accurately measured.

2 Likes

Yes, but it doesn’t matter because the attacker cannot control ID2. It can still control ID1, but we cannot do anything about it with offline generation and it cannot overload a vault with indirection objects because they are very small.

1 Like

Well I was going off the current model suggest for safecoin (RFC)

I do realise that is not good enough since “free space” can be manipulated. But what ever method the SAFEcoin system will use to determine when to increase prices. And we know that will relate somehow to the “section filling up”

2 Likes

Yes, I think I’m seeing this as the top approach so far (although as I mentioned above, I prefer using data stored as a metric for this). However, I usually favour options which stand on maths alone rather than ones which rely on human nature. With this approach we’re effectively saying that because it’s irrational to spend the required amount of resources on attacking this way, it won’t happen. That’s definitely a reasonable assumption, but if there’s an equally simple solution which means that an attacker can’t succeed, that’d get my vote :slight_smile:

There’s certainly a load of room in that bandwagon!

1 Like

Not just irrational but the attacker needs the coins to do it. And if when the section has say 10TB left for new storage and the cost is now 1 safecoin per “PUT” because of the critical storage space, the attacker needs 10,000 coins at this stage to finish filling up the section

What happens when the section is considered full (does not accept new storage)?

I assume it rejects any requests for storing so the attack stops.

But if you introduce the indirection then this helps both ways because the price rises before indirection kicks in and the attacker is being drained of resources during this lead up to indirection kicking in.

So both aspects will help to ward off such an attack.

3 Likes

That would be a problem for deduplication of ImmutableData.

4 Likes

Yes, I do agree. In this specific case I think I’m coming down (so far) on the side of the safecoin deterrant approach, although I’m certainly not convinced yet.

For example, say we take this approach and allow an attacked section to charge a massively inflated price for PUTs, but that it does have the desired effect of stopping the attacker from being able to cause a catastrophic cascade merge. What we’ve also done is incentivise someone to write a replacement for self-encryption which rather than deduplicating chunks, instead splits and encrypts a file in such a way that the resulting chunks land in low-cost sections, or at least that no chunks will land in sections which will reject them.

Losing the deduplication is an unintended consequence which hurts the network, and is probably impossible to quantify. I think it’s probably better to try and find a way which means a section can’t be pushed to its limit before relying on this approach which allows a section to run at its limit, albeit temporarily.

5 Likes

I have no intuition on the time it takes to brute-force/generate these chunks. Can anyone here provide some kind of a guestimate? Seems like this just puts a constraint on the minimum number of sections so that the section prefixes are long enough to make this impossible at launch given current technology.

This is a really good point. Rather than some kind of crafty adversary, this thread is also concerned with the situation where the network is just very very popular and storage is filling up faster than people bring new hard disks online. (remote possibility yes, but probably as remote as brute forcing enough section prefixes right?) Maybe we could also think of this as the “wd-samsung attack” ie. not enough hdd/ssd manufacturing capacity. Probably a good problem to have… :wink:

1 Like