How is Data Stored and Retrieved?

dyamanaka · June 9, 2014, 1:14am

When a file is uploaded, the SAFE Network shatters it into 1Mb chunks, then it makes 4 copies of each chunk and encrypts them before planting in vaults throughout the Network. Because of this process, none of the vaults can read any chunk stored with them. Even if they somehow managed to decrypt the 1Mb chunk, they would only have a fragment of the original file and still not know who it belongs too.

Example #1: A user uploads a 10Mb file.
The file is split into 10 chunks (1Mb each) and made into 4 copies. This means there are 40 chunks spread out to 40 vaults. When the user requests that file, they call on 40 vaults. But only the fastest of each (4 vaults per 1Mb chunk) are used to complete the retrieval. The speed at which the user can retrieve their completed file is limited by the fastest copy of the slowest 1Mb chunk arriving at their location.

Example #2: A user uploads a 1Gb file.
The file is split up into 1000 chunks (1Mb each) and made into 4 copies. This means there are 4000 chunks spread out to 4000 vaults. When the user requests that file, they call on 4000 vaults. Only the fastest of each (4 vaults per 1Mb chunk) are used to complete the retrieval. Again the speed at which the user can retrieve their completed file is only limited by the fastest copy of the slowest 1Mb chunk arriving at their location.

Instead of a whole 10Mb file being called from only 4 Vaults… you call 40 (1Mb chunks) from 40 Vaults. This makes a BIG difference in retrieval speed.

Q: What happens in the unlikely event that all 4 vaults which share the same 1Mb chunk are down?

A: The Network solves this problem by duplicating the chunks to a new vault whenever a vault goes offline.

fergish · June 9, 2014, 2:07am

This is great!
The only missing step I see is the encryption/obfiscation.

dyamanaka · June 9, 2014, 2:09am

True, but I wrote this for the general public. MaidSafe Wiki docs can go over the specifics on encryption and obfuscation.

chadrickm · June 9, 2014, 2:11am

I say add it anyway… The FAQs are not just for the general public. @david can you wikify this as well? Great post…

fergish · June 9, 2014, 2:14am

Doesn’t need to give details, but general public should be reminded/told that nothing goes out unencrypted.

dyamanaka · June 9, 2014, 2:20am

I added a link to the docs explaining the encryption process.

urrtag · June 9, 2014, 7:00am

Or how the reduplication of encrypted data works.

Thank you.

sanderbelou · October 7, 2014, 2:28am

some files are smaller than1Mb. Even smaller than 4Kb. How do these files get handled?

dyamanaka · October 7, 2014, 4:04am

AFAIK, files less than 1Mb should be encrypted and obfuscated as normal, without the shattering process. The API breaks down files larger than 1Mb to improve retrievability from the network.

Smaller files do lack the “jigsaw puzzle” layer. That may be something the devs can look into once we get underway with Testnet2 and Testnet3.

I recall a discussion on the main dev list. The client can be adjusted to break down files into smaller chunk sizes such as 1Kb. But the default is 1Mb.

sanderbelou · November 29, 2014, 9:52pm

excellent. thanx 4 the answer.
yes! more than enough people have sensitive info in files of >1Mb…
Testnet3 is on the way… i can’t hardly wait

Topic		Replies	Views
Storage proceeding Beginners	49	4940	October 5, 2015
Grouping vaults by owner, to strengthen redundancy? Features	30	3752	November 29, 2014
Questions about Chunks Features	1	1241	August 10, 2015
Can I match chunks of known files I store, to chunks stored on my/any Vault? Features	3	733	June 9, 2014
Data Deduplication Features	29	3378	September 7, 2017

How is Data Stored and Retrieved?

Related topics