Happy New Year one and all It’s great to be back - we’re determined to make this one really count The team have all managed to squeeze in a nice bit of down time and are raring to go. Over the break we’ve also been fixing a few things and pondering some possible improvements, including the optimum size of nodes and sections and changes to node age and relocation. For the first of these, we’ve been running internal testnets with smaller nodes and larger sections. These have been going pretty well and have revealed one or two other issues to do with comms and handover which we’ve worked through. The second is a design optimisation which will treat younger nodes differently from older ones. More detail on that one in the next couple of weeks.
General progress
A break in routine can be a great time to think about what can be done better and tie up loose ends. Here’s a summary of what the team has been up to since the last update.
Splitting the catch-all NodeMsg::Propose
messaging into four distinct variants for clarity.
RequestHandover
: when nodes finish DKG and request a handover to the current elders (node->elder)
SectionHandoverPromotion
: when elders tell those nodes that they are promoted as elder (elder->node)
SectionSplitPromotion
: when elders tell those nodes that they are promoted as elder in either side of a split (elder->node)
ProposeSectionState
: when elders decide to kick nodes or accept new nodes within a section (elder->elder)
This distinction makes explicit who is signing and who is receiving/aggregating the signatures.
Chopping unnecessary messages
We fixed an expensive issue where AE messages were repeatedly verifying the SectionTree for every message, even when it had already been verified.
Optimising AE
We’ve experimented with slowing down AE probing around splits to reduce the number of messages flying about, and also refactored global network section knowledge to target one random elder in three random sections every five minutes. Previously the default was all elders in three sections every 30 seconds. This resulted in greatly reduced CPU and memory use around splits, and the longer time should be sufficient for our needs: splits do not occur anything like every 30 seconds after all.
High memory use on waiting to join
We’ve had a go at a bug that’s been causing high memory use in nodes as they wait to join, as seen in recent testnets. We’ll be ready to put this through its paces on a community testnet shortly. We’ve also prevented caching of connection sessions for unjoined nodes.
Comms refactor
An ugly lock in the code for send-streams
in sn_node
has been refactored away. In addition, we’re testing “happy path” comms whereby clients can send a message to only one elder rather than all of them.
We also removed node tracking code and related locks which were potential points of failure. We were seeing many messages as a consequence of a failed send / storage level changes which were blocking nodes.
Changes to storage parameters
A margin has been added to storage capacity, whereby we expect a minimum amount of storage, but nodes can store more. This should help alleviate “could not store” errors before we split. We’ve also set up elders to store data (they weren’t previously) and use their local minimum storage capacity as an indicator of when to split, as discussed on the forum.
With this we now have the following flow:
- Nodes receive data.
- Every time we pass a certain level of storage used (for the first time) we allow new nodes to join.
2.1 When a node asks to join, we see that joins are allowed, and elders start a vote to add the node. - When new node joins, joins are disabled, we check if we’ve reached
min_capacity
3.1. We’ve not reachedmin_capacity
, continue as normal.
3.2. We’ve reachedmin_capacity
, clean up excess storage.
3.3 If we are still at or abovemin_capacity
, trigger the fail-safeallow_joins_until_split
.
3.4. When a node asks to join, we see that joins are allowed, and elders start a vote to add the node. - The joining node relieves storage load.
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!