Recent updates to how churn events are processed has led to a realisation that the custom process we had in place for the genesis section of a network was no longer required. Less complexity equals less code paths, which brings multiple benefits. @davidrusu explains in a bit more depth below.
General Progress
This week, we’ve gotten further with bidirectional stream use in nodes, so now we have the flow going a complete round trip… from client to elder to adult to elder to client
for an ACK
message. That is to say, ACK
s will now only come in after the data has been written (whereas main
has been ACK
on receipt of the message at elder… adults were not involved). This neatly evades a whole error class during tests and gives us more confidence in what we’re seeing during data storage at the client.
We’ve also been working hard refactoring away more code complexity. A PR from @anselme tidied up some more DKG work. @roland has more test code cleaned up and @bzee is hard at work updating for the latest quinn
crate and changes around using streams
therein.
Making the Genesis Section less special
There are a few things that make the very first Safe Network section special, for example it’s the only section who doesn’t have a parent section (obviously). But when we build complex systems, special is not something we want. It’s one more case to think about.
Prior to this week, the way that nodes joined the genesis section had a quirk where node ages were artificially inflated. Nodes joining early on started with a high age and progressively lowered ages for each node joining later.
i.e.
- Node A attempts to join with the default node age of 4
- The network responds with a
Retry(age=97)
- Node A starts the join process again with age 97.
- The network accepts them.
- Node B attempts to join with the default node age of 4
- The network responds with a
Retry(age=96)
(the next node age is stepped lower)
In a stampeding herd situation, you could have many nodes attempting to join at once forcing lots of age synchronization:
- Nodes A,B,C,D concurrently attempt to join with the default node ages of 4
- The network responds with a
Retry(age=97)
to all of them. - Nodes A,B,C,D start the join process again with age 97.
- Say the network accepts Node A.
- Nodes B,C,D will still be attempting to join with age 97, they will need to re-run through the age synchronization logic again
The reason we were doing this was to avoid excessive relocations early on in the network. If you recall, node’s are chosen randomly to be relocated to other sections when a churn event happens. The younger a node was, the more likely they will be chosen for relocation. To avoid having 80% of your section be relocated at once, we had introduced this age stepping behaviour to reduce the likelihood that a relocation would occur.
At some point we changed how churn events are processed to limit the number of nodes that can be relocated at once so that sections can maintain a healthy number of adults.
So now that the reasons behind the age stepping no longer hold, we are able to remove the age synchronization protocol when nodes join the genesis section. This makes the first section behave much closer to subsequent sections with no special code-paths dedicated to it! It should also make node-joins a bit more reliable and faster since we’ve removed one network round trip to synchronize the join age.
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!