Are Erasure Codes (Storj) better than Replication for the SAFE network?

mav · November 6, 2018, 6:57am

I’m really appreciating the detailed conversation in this topic.

It will be interesting to see how this is balanced with the following goal in section 2.6 of the whitepaper: “any distributed system intended for high performance applications must continuously and aggressively optimize for low latency not only on an individual process scale but also for the system’s entire architecture.”

Regarding cpu performance for erasure codes compared with network factors such as bandwidth and multi-hop latency, I found these figures from backblaze Java implementation of the same (reed solomon) algorithm:

“A Vault splits data into 17 shards, and has to calculate 3 parity shards from that, so that’s the configuration we use for performance measurements. Running in a single thread on Storage Pod hardware, our library can process incoming data at 149 megabytes per second. (This test was run on a single processor core, on a Pod with an Intel Xeon E5-1620 v2, clocked at 3.70GHz, on data not already in cache memory.)” (source)

Seems pretty fast, but the details and implementation within storj will be great to see. If it’s faster than pure redundancy for the end user then that’s the main thing.

To respond to my own prior question about the difference of client encoding vs network encoding and expand a bit on what @JoeSmithJr has already said, erasure coding seems to have some benefit when used at the network level compared with only at the client level.

Network level allows the network to coordinate in a potentially more efficient way (mainly interested in bandwidth efficiency) to restore lost data which it can’t do with client-only erasure coding.

The other benefit is there’s more ‘levers’ to tune the performance than with pure redundancy techniques, both for client downloads and the performance of network coordination.

I still haven’t fully digested the implication of erasure codes at the network level, but it’s very interesting and will continue to ruminate.

From 3.4.2 - “Erasure codes enable an enormous performance benefit, which is the ability to avoid waiting for “long-tail” response times”

Surely this also applies to pure redundancy where the fastest response is used. Selecting the fastest N of P parts doesn’t seem better than selecting the fastest 1 of P redundant chunks.

I think erasure codes are being overstated in their ability for this particular benefit.

But the expansion factor benefits tabulated in the start of 3.4 are very cool. Although I’m not sure if expansion factor is the aspect that needs the focus (in the short and / or long term).

Some other previous conversations on this forum about erasure codes (with further links within each):

Aug 2015 - Mojette Transform and SAFE
July 2017 - Preventing data loss using parity checks

Topic		Replies	Views
Object storage prior art and lit review Blog Posts	1	683	December 9, 2019
Safe features in use at the moment Features	1	802	April 19, 2016
Understanding the Information Dispersal Algorithm (IDA) Features	8	1705	June 10, 2022
Speed and Resiliency through Forward Error Correction Features	9	1643	March 7, 2017
Mojette Transform and SAFE Features	23	3215	August 31, 2015

Are Erasure Codes (Storj) better than Replication for the SAFE network?

Related topics