Are Erasure Codes (Storj) better than Replication for the SAFE network?

JoeSmithJr · November 7, 2018, 2:01pm

Oh, well. I’m more interested in what would do best for the Safe Network than in what Storj is doing, and that sounds like a good match for the title of this thread Inspirations are great, mindless copying is stupid.

I thought a bit more and I think a “concat first 4 blocks => chunk, rest of blocks => distributed parity” type of scheme would be the best of both worlds.

In the simplest implementation, clients would generate chunks as they do now (with encryption and so on), but then they would slice them up into 4 pieces, store the hash for both the chunk and the pieces in the data map, and then PUT the chunks as usual.

The vault responsible for the chunk (based on its XOR distance like now) would also slice it up into 4 blocks to match the client’s data map, then it would generate 12 additional parity blocks, put their hashes into a “minimap” and send the 16 blocks to the sections responsible for storing them based on their hash, together with the minimap so those vaults can restore the block if it gets lost.

Clients would never have to see any of the parity blocks or the chunk minimaps – they would just request the 4 primary (unencoded) blocks they stored in the data map, concatenate them, and everything is the same as now from that point on. As you can see, low-end clients would not be affected in any way.*

* High-end clients could request the minimaps so they could request more blocks than necessary to achieve a lower latency; this is purely optional and should not be part of the official client library.

The huge win is unprecedented redundancy (any 12 of the 16 blocks can be lost, mixed with Safe’s “instant loss detection” feature) at a comparatively low (4x) storage cost without complexity in (potentially low-end) clients. Only vaults, which are higher-end nodes by definition, would need to ever worry about the more computationally demanding erasure code decoding.

I’m not sure I understand this. I wouldn’t think FEC requires more nodes to stay up than replication; quite the contrary, at least in the schemes I was considering.

Could it be we’re thinking about using FEC in fundamentally different ways?

Possible implementation:

M-of-N with M=4 and N=16 – 2 MB chunks, 16 x 512 KB blocks, any 12 of which can be lost at the same time
this requires 2 requests / 1 MB data, up from the current 1 request (if we have 1 MB chunks now); this is the primary trade-off for the higher comparative redundancy
chunks are “minimaps” that point to the 16 blocks that make it up; the vault that “stores” the chunk stores only the minimap
the first 4 blocks are the primary blocks and they are just 4 unencoded slices of the (already encrypted) chunk data; these are the blocks that the data maps point to (that is, clients neither know about, nor request any of the parity blocks)

Topic		Replies	Views
Object storage prior art and lit review Blog Posts	1	684	December 9, 2019
Safe features in use at the moment Features	1	802	April 19, 2016
Understanding the Information Dispersal Algorithm (IDA) Features	8	1712	June 10, 2022
Speed and Resiliency through Forward Error Correction Features	9	1643	March 7, 2017
Mojette Transform and SAFE Features	23	3215	August 31, 2015

Are Erasure Codes (Storj) better than Replication for the SAFE network?

Related topics