A key area of focus at the moment is membership, how elders keep track of the adults and other elders in their section, so they can handle new joins, splits, churn and promotions. This functionality is handled by the sn_membership crate. The algorithms in this crate are currently undergoing rigorous testing before being integrated into the network. This is not a new feature, as such, but rather a forensic tightening up of the halfway-house we’ve had thus far and a formalisation of the algorithms. Most of the team are currently working on some aspect of membership.
General progress
The sn_membership
crate is being handled by @davidrusu. Almost all tests are passing now, so it’s mostly tidying up.
@bochaco is looking at flows within membership: what is the order of events when a new node joins or the oldest adult is promoted?
One aspect covered by membership handling is elder handovers, making sure that newly promoted elders have all the right information, keys and so forth. @anselme has got this to a good place now and it’s pretty much ready to go.
Away from membership, @chriso has been working through niggles with nightly testing and, having finished CLI documentation, is now writing up NRS.
On the data front, @yogesh is focused on testing data replication, and @joshuef has updated qp2p to the latest version of quinn.
Meanwhile @danda is plugging away at DBCs. Integrating Ring CT is pretty much done, and mints are the next step, although more work is needed to get mints working properly with Ring CTs.
@happybeing has made a lovely wee PR updating node logging. This should help save space on any nodes started and make logs easier to follow with commands like tail -f
.
Membership
Membership operations cover new nodes joining the section, managing capacity, ejecting misbehaving nodes or reducing their node age, promoting adults to elders and ensuring new elders are properly equipped.
As a reminder, sections have seven elders and (unless future testing shows otherwise) 60 - 100 adults. Adults churn frequently, with dropouts, new joiners and older nodes coming when relocated from other sections. The elders need to keep track of section membership so they know when to allow new adults to join, and also when elders churn and an adult is promoted. They retain a list of all current adults and elders in their section.
Section membership is constrained by max section size. When we have such constraints in a distributed system, we often need to resort to consensus to decide between competing options. In our case of membership, elders need to decide which of the (many) nodes waiting to join a section should be allowed in.
We were not using a consensus algorithm up to this point, the current processes for managing membership sometimes gets tripped up when unexpected events occur.
The new sn_membership
crate provides a leaderless BFT consensus algorithm providing good performance in an eventually synchronous network model. Following a merciless testing regime, it is now ready to be integrated into the network.
sn_membership
works together with anti-entropy (AE) and distributed key generation (DKG) to manage the section membership. Here are some flows.
Node joining
A joining node interacts with an elder, exchanging JoinRequests
messages until it’s provisionally accepted and receives the resource proof challenge and returns it to the elder.
Under the old system, once it had passed the test the node would be in, but this was a security risk and could cause blockages (see below). Now the elder sends a proposal to add the node using the sn_membership
protocol to other elders of the section. sn_membership
completes once we have Super-Majority over Super-Majority
, that is, a super-majority of elders see that a super-majority of Elders have accepted this proposal. Once sn_membership
has started, it’s guaranteed to complete (as long as our eventually-synchronous network assumption is not violated).
Once the proposal reaches consensus among elders, the elder sends back the approval to the joining node.
Adult promotion and elder handover
If an elder notices that the current elders are not the seven oldest nodes, then it sparks a vote on promoting the oldest adult(s) and demoting the youngest elders to make way.
The Elder Handover algorithm controlling this process, which is now ready to be integrated into the sn_membership
crate, goes as follows.
An elder receives a supermajority of completed DKG shares to check the current elders are the oldest seven members.
The elder proposes a new set of elders.
The elders follow a sn_membership
style consensus to decide on a single NewElders
message. This step is required when we have a complicated chain of events that end up with multiple groups of nodes racing to complete DKG and become the next elders.
A list of current section members is passed to the new elder, the section authority provider (SAP) is updated and a new block added to the section chain.
The role of consensus
So why is consensus necessary for membership when other parts of the network rely on AE to stay updated?
Here’s an example. Let’s say a section is nearly full. The section size limit is 50 nodes (just for example) and there are currently 49 members. A new node sends a JoinRequest
to an elder. Under a system without consensus the elder checks the capacity and sees there is capacity, exchanges AE messages, and provided the new node passes the Resource Proof test, it’s in.
But at this stage the section is ready to split, which when multiple nodes are trying to join can lead to conflicting priorities:
Let’s say, in the extreme case, all seven elders receive JoinRequests
simultaneously from seven different nodes. All seven elders see that we have room for one more node, and since each of the seven nodes had passed their Resource Proof tests, each elder will allow their node to join the section.
But, upon anti-entropy gossip between elders, they find that their fellow elders will not accept each other’s new nodes as it would push the section capacity over the limit. The elders find themselves in a split-brain situation where each elder has a different view of section membership.
To prevent this issue from happening, the elders come to consensus on which nodes will be allowed in. With sn_consensus
, each of the seven elders can propose up to one change. This means up to seven changes (join/leave) can be decided in a single round.
In the case above, sn_membership
will mean some extra work compared to the elders acting on their own, but this overhead is only visible when we have many competing choices. sn_membership
gracefully scales down to Byzantine Reliable Broadcast (BRB) when elders are in general agreement about the actions to take, we only pay the consensus overhead when elders start having disagreements. sn_membership
is a peaceful method to resolve disagreements
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ;
German ;
Spanish ;
French;
Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!