Just to parrot and expand upon what @dirvine said, seven is quite a magic number, but magic for good reason. Let’s look at some examples if two (2) Elders are lost. For maximum theoretical safeness we need to assume the two Elders which are lost are “good” Elders and the maximum possible number of adversaries remain. If the 2/3 rule yields floating point numbers, we do standard rounding for the probabilities of how many malicious nodes exist in the elder group (2.33 malicious → 2, 2.67 malicious → 3).
Here goes:
- 6 Elder Group Size with 6/3 = 2 malicious nodes → 4 Elder group size, 2 malicious nodes survive. This yields: 2 out of 4 consensus = 50% < 2/3 … FAILED with no simple majority.
- 7 Elder Group Size with 7/3 = 2.33 malicious nodes → 5 Elder group size, 2 malicious nodes survive. This yields: 3 out of 5 consensus = 60% <= 2/3 … GOOD with a simple majority close to 2/3.
- 8 Elder Group Size, 8/3 = 2.67 malicious nodes → 6 Elder group size, 3 malicious nodes survive. This yields: 3 out of 6 consensus = 50% < 2/3 … FAILED with no simple majority.
- 9 Elder Group Size, 9/3 = 3 malicious nodes → 7 Elder group size, 3 malicious nodes survive. This yields: 4 out of 7 consensus 57% < 2/3, a MODEST but VALID simple majority.
- 10 Elder Group Size, 10/3 = 3.33 malicious nodes → 8 Elder group size, 3 malicious nodes survive. This yields: 5 out of 8 consensus 62.5% < 2/3, but a STRONG simple majority very close to 2/3.
- 11 Elder Group Size, 11/3 = 3.67 malicious nodes → 9 Elder group size, 4 malicious nodes survive. This yields: 5 out of 9 consensus 55.55% < 2/3, the LEAST but VALID simple majority.
- 12 Elder Group Size, 12/3 = 4 malicious nodes → 10 Elder group size, 4 malicious nodes survive. (6 out of 10 consensus 60% < 2/3, same GOOD strength as an elder group size that started with 7, but more communication overhead required to support.
When you look at these examples, group sizes of 7, 9, 10, and 12 stand out as “magic” numbers with good properties. The examples show that:
- An initial group size of 7 offers the lowest communication overhead and can withstand up to 2 lost elders while maintaining a simple majority.
- An initial group size of 10 or 12 offers a modest increase in communication overhead, but could handle a loss of up to 3 Elders and still maintain a decent simple majority. (10 with 3 bad nodes → 4/7 = 57%).
Now, that being said, I question the validity of using standard rounding rules for determining the theoretical number of malicious nodes in the Elder group. For example 7/3 malicious nodes = 2.33. To me, this means that there is a finite probability that there could be 3 malicious nodes in the Elder group and consensus would FAIL if two good Elders are lost. (I suppose one could argue though that it already failed if we require 5 out of 7 consensus in the original group… maybe one of the bad guys plays nice and always votes like a good guy until it detects a weakened state). From a different perspective, accommodating only 2 out of 7 malicious elders (2/7 = 28.5%) means that the network isn’t living up to its theoretical maximum possible resilience of 33.33% malicious node handling capability. This “living up to its theoretical potential” perspective is probably a better way to view it…
If we presume a worst case scenario and always round up on the malicious node count probabilities then the outlook changes. Under that scenario an Elder group size of 12 is optimal for both minimal communication overhead and the ability to handle up to 3 good elders being eliminated. ( 12 with 4 malicious → 9 with 4 malicious, thus 5/9 = 55.55% consensus and a valid simple majority in the degraded state).
TLDR; An initial elder group size of 7, 9, 10, and 12 are the only good options if you want to keep the elder counts as low as possible. A size of 7 will work well to prove things out with the lowest communication overhead. Larger sizes may be more optimal long term, if you care about max malicious node handling capability and stability during elder churn.
Final Edit: A second look at this in consideration of the percentage of network failure needed to cause an instant loss of elders and consensus strength. This shows some interesting trade-offs. Consider the natural progression in malicious node handling capability, number of lost Elders the section can handle, percentage of instant network destruction this implies, and the resulting degraded consensus level.
- 7 initial w/ 2 malicious (28.5% ‘baddies’), lose 2 = 28.5% network failure → 3/5 = 60% consensus
- 9 initial w/ 3 malicious (33.3% ‘baddies’), lose 2 = 22.2% network failure → 4/7 = 57.1% consensus
- 12 initial w/ 4 malicious, lose 3 = 25% network failure → 5/9 = 55.5% consensus
- 15 initial w/ 5 malicious, lose 4 → 26.7% network failure → 6/11 = 54.5% consensus
- 18 initial w/ 6 malicious, lose 5 → 27.8% network failure → 7/13 = 53.8% consensus
- 21 initial w/ 7 malicious, lose 6 → 28.5% network failure → 8/15 = 53.3% consensus
- 24 initial w/ 8 malicious, lose 7 → 29.1% network failure → 9/17 = 52.9% consensus
- 27 initial w/ 9 malicious, lose 8 → 29.6% network failure → 10/19 52.6% consensus
- 30 initial w/ 10 malicious, lose 9 → 30.0% network failure → 11/21 = 52.3% consensus
- …
- 300 initial w/ 100 malicious, lose 99 → 33% network failure → 101/201 = 50.2% consensus
The choice of 7 looks really great here. You get 28.5% malicious node handling capability and can withstand an instantaneous failure of 28.5% of the nodes. You would need 21 elders for 33% malicious nodes and 28.5% instant failure tolerance. The number of nodes required for 33% malicious handling and 33% failure tolerance is staggering, and less secure since we nearly lose consensus when degraded to 101/200 = 50.2%. Based on this quick look, 7, 12, and 21 are the best contenders imo. It may be a toss-up on communication overhead between group sizes 7 and 12 if there is a fair amount of Elder churn. I really like the fact that a group size of 12 ensures that the network can handle up to the theoretical limit of 33.33% malicious nodes, and a 25% instant failure handling is essentially the same as 28.5% if ~1/4 of the planet goes offline. We’ll probably need a restart at that point anyhow.