Something I wonder is whether I have the flow correct in my head or not.
From Update May 26, 2022
If the client contacts all seven section elders asking for a 1MB chunk ‘A’, and each elder in turn requests chunk A from the four adults holding it and returns them to the client, then potentially that’s 28MB (4x7) passing across the network for a single 1MB GET request.
Is this the correct diagram for this process?
Currently, we are using a halfway-house system where the client contacts three elders at random and those elders contact the four adults holding chunk A - so we potentially have 12MB (3x4) of data flowing across the network for a 1MB request - better, but still not great. And reducing contacts to three elders comes at a cost: if we have only three elders interacting with the adults we no longer have a supermajority to decide if one of the adults is dysfunctional, making dysfunction tracking more complex.
Is this the correct diagram for this halfway-house process?
A solution we are looking at now is information dispersal algorithms (IDA, also known as erasure coding). This could help us significantly reduce the volume of data transfer per GET, by having adults pass a share of the data to the elders rather than the whole thing. The elders then pass these data shares back to the client, who reassembles them and, voila, chunk A is recreated. Potentially this could reduce the flows on a GET to just 1.4MB for a 1MB chunk.
The change is that adults will be able to split chunk A into seven pieces using an IDA and pass just one piece back to the elders on request, rather than the whole chunk, meaning that much less data is exchanged.
Once it has collected five out of the seven pieces, the client can reassemble chunk A.
Does this diagram correctly reflect this process?
edit: no it can’t be correct since each elder only receives 4 shares (one from each adult), but we need 5 shares to reconstruct the data, ideally all 7 would arrive… do elders contact more than 4 adults? Is it 7 adults (to get 7 shares)?
Just trying to understand the flow of data and the amount of duplication and network messaging… still not sure if I got the flow right or not?