Many thanks to @stout77 for another cover image
Update May 12, 2022
One of the simplest but also most fundamental and important features in Safe Network design is Node Age
. Essentially, Node Age
replaces systems like Proof Of Work in rewarding good behaviour, punishing bad, and making life very difficult for a Sybil attacker. It provides an important measure of the quality and ongoing trustworthiness of every node, and is our featured topic this time around.
General progress
On the back of our groundbreaking work with DBCs, in which @davidrusu and others have taken the âdigital cashâ concept in a whole new direction making it Byzantine fault-tolerant and thus fit for a decentralised network, we are happy to announce that David Rusu will be heading up a new Safe Labs division. This will be our R&D umbrella for state-of-the-art cryptography, networking and more. Research will be primarily Safe-oriented rather than blue sky, but we want to pull in expertise from wherever it may exist in a more formal and structured way.
@Anselme finalised a PR for checking the SAP on handover, and has started looking into Byzantine behaviours in handover (the process of redistributing data on a churn event).
David Rusu gave a presentation on conflict-free replicated data types (CRDTs) at a Toronto CompSci meetup, mentioning what heâs been doing at MaidSafe (naturally!). Lots of interest in the topic and plenty of contacts to be made. Heâs going back for another one on CRDT trees.
@Bochaco has completed a PR to check permission at the client side when performing operations on registers (mutable data) and is also working on the spent book client API.
And @Chriso has been looking at testnet failures caused by the temporary removal of features like max-capacity
.
Testing internally, Metricbeat has shown us some nodes creeping up to some verrry high mem usage over a day or so. Diving in, we realised that there appeared to be quite an edge-casey deadlock occurring there (centred around clean-up of connections). Weâve a few fix options here and so are just looking and testing to see what makes the most sense there.
Meanwhile @Qi_ma gave a talk to the team on Node Age
.
Node Age
Every node on the network has an address which is decided by as its ID, which is actually a key thatâs generated when it joins the network. This node ID
is essentially a very large random number. Its first few bits (eg. 0101101âŚ) determine what section the node will be in and therefore what data it will look after, while the last eight bits (e.g. 00000101) signifies its Node Age
- in this case 5.
When a node is first accepted into the network it is given a Node Age
of 5, so its ID ends âŚ00000101 (the joining node must keep generating ED25519 keys until it gets one with the correct ending and the correct prefix, generally a sub-second process).
The longer the node remains an active participant on the network, the larger ite Node Age
will grow, up to a highly unlikely maximum of 255. But there are a couple of catches: (1) its Node Age
will only grow if it proves itself reliable at storing data chunks and giving them up when requested over a certain time period. (2) Each time its Node Age
is incremented, it must move to another section.
But Safe Network has no concept of time, so how can we track how long the node has been behaving? The answer is we use churn events (section membership changes) as a proxy for time.
Churn ID - The Decider
Each section will contain 7 elders (decision-making nodes) and 60+ adults (storage nodes). Each time a node goes offline or joins the section, which happens frequently with adults, elders vote on what has happened. Each churn event has a 256-bit ID, which is the combined BLS signature of 5 out of the 7 elders. This churn ID
is also effectively a random number and cannot be predicted beforehand.
If the new node proves itself to be dysfunctional within the first few churn events it will be ejected and will need to ask to join again. No point in wasting resources on a dead weight.
On the other hand, if our new node performs its duties properly for a few churn events we want to reward it and increase its age by 1, but we donât want to have to track it and record when it joined etc. So we use the churn ID
as a sort of lottery ticket.
The churn ID (a random number, remember) provides two functions so far as nodes are concerned. First it provides a way for nodes to get their Node Age
increased, and second, since we donât want nodes to build their reputation in just one section because of the risk of malicious behaviour, the churn ID
also decides which random section the newly promoted node will join.
Chance of promotion
If the churn ID
is modulo divisible by 2 exp Node Age
(churn ID % 2^age == 0) we will get promoted. So for our new node age 5, if the churn ID
is divisible by 32 - which will happen on average once every 32 churns - it gets its Node Age
bumped up to 6 and moved to a new section. It will then likely have to wait another 64 churns in its new section before it gets promoted again - promotion becomes exponentially more difficult the longer it remains. This means that Elders, the oldest 7 nodes in the section, have been around a long time and proved themselves in many different sections before achieving their voting status.
How does it work? On every churn event, the elders divide the churn ID
by age, starting with the oldest (255) and working down to the youngest (5). When one of those ages matches a set of nodes in our section, then we relocate up to elder_count/2
nodes which have that Node Age
. There will usually be only one in that age bracket, but in the case of an excess we select the nodes with a node ID
closest to the churn ID
.
Nodes can also be demoted for dysfunctional behaviour (bad performance in comparison with their peers). In this case, Node Age
is halved before they are relocated.
Benefits of Node Age
This scheme has three main benefits. The first is Sybil resistance. In order to control a section, an attacker will need to control at least three elders. The process of becoming an elder is long and hard, and itâs impossible to know which section youâll end up in. When the network is large, a 7 elders to 60+ adults ratio will make such Sybil attacks extremely difficult. In addition, new nodes are only allowed to join a section when more storage is needed, so attackers cannot flood the network with new joiners.
The second aim is to avoid undue work. If a node fails, it will likely do so early, so we kick it out before it can progress any further.
The third is general randomisation. Forcing the nodes to hop from section to section to gain trust also has the benefit of distributing capability evenly.
Relocation flow
@Qi_ma has been working on the implementation of Node Age including the messaging flows between the section elders, the candidate for promotion, and the elders in the target section. He gave a talk to the team this week. Here is one of his slides.
Elders in the source section
- Agree on a churn event (membership change) and sign it (Churn ID)
- Check if there are any candidates for relocation
- Pick the oldest candidate(s)
- Calculate their destination sections from their node ID combined with the Churn ID
- Increase their age by 1
- Cast a vote for each one to be relocated
- When enough vote shares have been gathered, inform each candidate node
Candidate node
- Receives message from elders
- Acknowledges relocation process starting
- Generates a new ID with correct initial bits (section) and trailing bits (its new age)
- Bootstraps to the new section [it has authority to do so from its original section]
Elders in destination section
- Check the source sectionâs knowledge is up to date (the SAP)
- Update them if not and tell them to resend
- Check relocation signatures and details are in order
- Vote on the candidate joining
- If all goes well, candidate joins new section
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, weâre always looking for feedback, comments and community contributions - so donât be shy, join in and letâs create the Safe Network together!