So we have say 3 chunks, A B & C. Consider then erasure code where 2 of 3 is OK. Now forgetting the size issues on getting the data (replicated chunks will be the most efficient) or the size of total data stored for X resilience (where X is % you can lose) where erasure codes look to be more efficient (but I will explain below, why they may not be).
Okay, just to make sure I understand what you’re saying, let’s say you have some chunk you’d like to store, X. You have two redundancy schemes, replication and erasure codes. In the replication scenario, R_r(X), R_r(X) = A = B = C. In the erasure code scenario, R_ec(X) = (A, B, C) where all A, B, C are unique. In the replication scenario, you only need one of A, B, or C (X, X, or X) to recover X, and in the erasure code scenario, you need two of the three of A, B, or C.
I’m puzzled that you say replicated chunks are the most efficient for getting the data. In the replication scenario, A, B, C, and X are all the same size. In the erasure code scenario, A, B, and C are each 1/2 the size of X. When doing a retrieval, you only need 2 of the erasure code copies, which is still overall the same number of bytes of data transfer as replication, but you get the benefit of being able to recover it from two sources in parallel.
In terms of resilience, or durability, in this 1/3 or 2/3 scenario, yes, replication is more efficient, but there’s a tradeoff. The total amount of data in the replication scenario is 3 times, or what we’ve been calling an expansion factor of 3. In the erasure code scenario, the total amount of data is 1.5 times, or half as much. Erasure coding is twice as efficient in terms of bandwidth and disk space for put operations. If we were going to use the same amount of bandwidth and disk space, erasure coding once again would win in terms of resiliency.
So in a decentralised network, you need to know that when the network loses a chunk (of any kind) then it has Y time to create a copy of that chunk. If replicated then that is easy as there are more than 1 copies of each chunk (we have that set to 8 right now, but 4 is probably way beyond what is required.
Sure, in an erasure code scenario, let’s say you have 50 total pieces, and you need 20 of them. You don’t really need to repair until you drop to a certain threshold. Maybe 30? When the redundancy drops to 30, you can download the chunk once (1x the size of X (20 * X/20)), then store the missing 20 pieces (in this case, each piece is 1/20th the size of X, so again, 1x the total bandwidth transfer). We’ve recovered significant redundancy with very little bandwidth transfer, because 20 of the repairs were batched.
However, if you have an erasure code system A B & C are all different chunks. So if you lose any you would need to make a copy. So if you lose A for example then the network would require to make a copy of A, to do that nodes would need to get copies of B & C to recreate A (thanks @draw ) .
Not right away. This is especially true if you’re using a 30/80 erasure code.
This is a lot of work and if B or C also goes down then we are screwed. So erasure codes used like this means that instead of having 4 exact copies then we may only have 3 different copies, where 2 can recreate the file.
Yeah, erasure codes where you do 2 out of 3 are pitifully low numbers. This 2/3 example would never work, I agree.
This part could be very costly IMO. So the balance of using the replicant number of chunks versus erasure codes where space efficiency is definitely improved is upset by the networks ability to maintain the chunks it should. I hope I am making sense, but it is an interesting aspect to consider IMO.
The primary reason for us to use erasure codes is not disk space efficiency, but overall bandwidth efficiency. Erasure codes, where you repair immediately if any piece is lost, would be bad, obviously, so we don’t do that. We only repair when we get to a repair threshold, where we start to consider that hey you know, maybe the data is at risk. We still can tolerate 10 losses or something, but that’s too close to comfort, so we’ll repair the missing 30.
Overall, this means that we spend significantly less bandwidth. With replication, you do have to spend 1x the overall size of the chunk for every time you lose a piece.