@antoine, in enterprise storage systems, which are relatively expensive (per unit of capacity), deduplication can save money.
In Big Data, even when it’s used by enterprises, deduplication is not popular because it takes resources to deduplicate and “revert” deduplication (which is required if you want to access data that was deduplicated). In this case it is said that enterprise-like deduplication would be “expensive”.
In loosely coupled systems, such as MaidSafe, deduplication would be extremely expensive.
Now, since MaidSafe has deduplication, how come it’s not used?
Deduplication that’s used in enterprises attempts to go after very high-hanging fruit (because the storage is so expensive). So apart from the simple things (like 2 identical files in different directories), it can go down to block level (4KB, for example).
So if one file has 14 KB and then you open that file, Save as...
and append 4KB to that new file, you’ll have 2 files that are very similar. If your block size is 4KB, you need 4 data blocks for the first file (4 * 4 KB = 16 KB) and just 2 data blocks for the “non-deduplicable” part of the second file (4KB for the additionally added part, but because it’s spread over the 4th and 5th block of the second file, you need 2 blocks). There are other approaches too, and most involve a lot of data churning…
MaidSafe works with very cheap storage and there’s no need to employ such complicated approaches.
In the same scenario above, the both files (without any deduplication) would fit within a single chunk.
If you had a 4GB large ISO file and if you appended 4KB to a copy of that file and uploaded the both to MaidSafe, only the last chunk or two would differ, while the rest would be identical. MaidSafe would look at chunk hashes, rather than 4KB blocks.
If you had a 4GB compressed video and added 1 frame in the middle, I think all MaidSafe chunks after that frame would differ, so in that case only 50% would be deduplicated, even if you added just 1 letter on a single frame in the middle of the video file.
What Gartner says is that you can save 80 or more percent of capacity in absurd scenarios like VDI (where you create a golden VMware desktop image, then from it 20 VMware images for different departments and apps, and then serve those to 500 people. In the end you can save 80% and instead of 10 GB * 500 use much less, but that’s because enterprise hardware and software is specifically made to save space and deduplicate that exact type of frequently seen scenario.