Deduplication Hash Collision

After reading a bit about MaidSafe, I was, like everyone else, excited. One feature that I thought was interesting was deduplication. Obviously an important feature, right? Well, it performs this feature (or will, I don’t know if it’s programmed in yet) with hashes. But hashes can’t be one to one, so there are collisions.

Thus leads to my question: is there anything being done to address this problem? I don’t want to upload something, and come back later with corrupted data, however low that probability may be. After all billions of people are supposed to use this eventually.

A hash collision would mean someone could steal some bitcoins (for the sake of example). Think of all that hashing power that goes into Bitcoin and this isn’t a problem and will not be one anytime soon. i.e. don’t worry about it.

1 Like

There are more possible 512 bit hashes than that there are atoms in the universe. We really don’t have to worry about that.

Edit: Think of it like this, if you and I both pick a random grain of sand on this planet, what are the odds we would both pick exactly the same grain of sand? A 512 bit hash collision is many many many times more unlikely than that.

Edit2: It would be more like if we keep picking random grains of sand on this planet, and by incredible luck we’d pick the same again and again. Probably even more unlikely than that.

4 Likes

Alright.

I was about to reply:

If I uploaded 2^513 different files,

That’s 10^50 times biggerr than a googol.
Honestly, it’s so weird thinking that almost every file we could upload has an almost unique hash.

Also, log2(amount of megabytes on the internet) = 50. Meaning 2^462 different things still left.

OK ignore me. The randomness is sufficiently random. (and if it isn’t, I told you so :stuck_out_tongue: )

3 Likes

But you didn’t quantify the likelihood of a collision, so the statement is pointless.
There is always a chance for 2 hashes to collide. That doesn’t mean we should “worry” about it.

3 Likes