Preventing data loss using parity checks

Its been a while since we discussed this. Must be over a year now.

This would be an easy way to ensure all parts of the files exist for critical files, even if large outages are current.

The biggest issue I see is that we don’t want this at the core level and as an optional feature to be used on selected files. Also the memory usage problem, big files require a lot of CPU/time (increases exponentially as file size increases) to check let alone correct errors. This makes it not suitable for media files since it might require the whole file worth of chunks downloaded before starting to play. Useless for movie watching on most phones.

One solution was suggested that the pars is on the chunk, so the chunk is split into 8 sections and have 6 data parts and 2 par parts. This way the checking is on each chunk and no need for downloading enough chunks to recreate the whole file (maybe GBs) Doing it on a per chunk basis also means that if data transmission error occurs then the chunk doesn’t necessarily need to be downloaded again.

And it would be suitable for media files since its done on a chunk at a time. PAR checking of 1MB is very quick.

So each part is 128KB and there is 6x128KB of data stored in the chunk and 2x128KB par parts

Again even on a chunk basis I think it needs to be optional and really a layer above the SAFE code.

For extra reading on this
EDIT: one discussion that progresses into PAR “files”

Another one

Still cannot find the topic specifically discussing this.

EDIT: thanks to @mav who found the topic

8 Likes