This is my first post on these forums, although I’ve done a fair amount of reading on them recently. For now, I’ll start with a question, which I haven’t seen come up yet (in this aspect, at least). If it has already been discussed in detail, though, I apologize in advance for being redundant.
That being said, I am curious about how MaidSafe’s de-duplication process will actually work. If my understanding is correct, data will be divided into ‘chunks’, and identical chunks will be shared across the SAFE Network, rather than having many copies of each chunk. Assuming that’s right, how identical is “identical”? Does that mean data will only be de-duplicated if it is 100% bit-for-bit identical?
In a blog post, Nick Lambert gave this example: “So, if 10,000 copies of The Beatles ‘A Day in the Life’ are uploaded, the network will identify that they are identical and update the subscriber count to 10,000, but keep only 4 copies of each file.” [Source: The Economics of Safecoin | by MaidSafe | safenetwork | Medium ]
So, following this example of a song, one might naturally wonder about the many varieties that may exist of a given song. An obvious example would be different release forms (CDs, remasterings, collections, various geographical regions, etc.), all of which might have slight variations that one person may prefer over another. Or, perhaps someone even wants to have every possible version of this particular song. Additionally, if the source is not the issue, one person may simply have ripped a track with greater accuracy than another person. Similarly, what if people have the same song, from the same release (perhaps even originating from the same shared digital source), but have customized its metadata differently, or normalized the audio levels compared to other tracks in a playlist?
How would such differences be handled? Could data be “de-duplicated” even if it is not an exact match?
Thanks in advance to anyone who might be able to help me understand this process better. And, again, if I’ve repeated an already-discussed issue, perhaps my post can be de-duplicated too.