Are Erasure Codes (Storj) better than Replication for the SAFE network?

That’s a bit of a stretch. It’s basically n of p where you set the ratio you wish. So p can be 10 and n be 20, then any 10 parts will give the file. It’s a debate between replication and (error coding or) erasure codes, striping etc. Functionally they are similar, the debates are on efficiency really. So the number of replicants verses the n/p ratio is the argument of the past. I have always favored replication as it is simpler and faster. n/p ratios had a bigger argument when there were disc space efficiency issues, as with raid 5 and so on. I feel this is confused in distributed computing though where the number of stripes can be as many as we wish (replicants). As I say though it can be argued for a very long time.

Talk of AWS resiliency is not about erasure codes really, neither is data transmission losses (this is what reliable protocols are for). These are kinda misleading I think, it is about the number of copies you need to get (however they are encoded) to retrieve a file plus taking into account the probabilities the machines they are stored on will be able to give those chunks. It really is that simple. AWS resilience will be matched by many projects on launch when their software is proven over time to be resilient, then the underlying algorithms can be considered resilient, but not before.

10-16 GB would mean they set replication at somewhere from 10-16? So which was it?
2-3 is unclear if it means 2 or 3? with erasure coding that number should be known up front, otherwise n and p have not been selected.

So seems strange to me to have ranges of numbers like that? However, if replication was 16 then it meant that the network needs 16 copies as some may vanish (up to 15) and still have the data (not at AWS resilience, beyond that to 100%). If 15 machines could disappear and you have an erasure code where you need 2 or 3 parts, then those could have existed in those 15 lost machines. So this all makes little sense to me? if you only need 2 copies of 3 say, you could store the whole file on 2 machines with the exact same resilience. This is a confusing way to frame erasure codes in a decentralised network of nodes.

So erasure codes can be used, I agree (both these and replication do work), but these figures do not make sense.

17 Likes