We’re on the cusp of being able to roll out the new integrated CLI and API for community testing, but there are a few edge cases still causing trouble - and history tells us that what can go wrong will go wrong. So we’re working on ironing those out before we get the CLI integrated properly.
In the last update we mentioned that DKG has been causing (unrelated) issues, with nodes sometimes failing to be promoted to elders and splits occasionally not taking place as they should. Some of the blame lies with messages arriving out of order and not being handled properly during DKG runs. @lionel.faber explains how we’re fixing that with Anti-Entropy.
General progress
Most of the CLI commands are now working well, but a few are not consistent as yet. We also need to add the CLI to the new automated release process.
@Anselme has been digging further into the NRS, including implementing dry run support for operations batching. He has also been helping @danda and David Rusu in their work on proof-of-payment and private transactions. If you’ve been following these updates you’ll know they have been investigating the best ways of implementing DBCs in a way that transactions can be quick and non-traceable, while also being auditable and supporting multiple DBC outputs. The guys have been pursuing a couple of paths, and so far the most promising approach seems to be a version of Ring Confidential Transactions (RingCT) as used by Monero, which can be used to provide a proof-of-payment. David Rusu gave a presentation to the team on this, which we will reproduce here at some point soon.
Resolving DKG issues using Anti Entropy
Distributed key generation (DKG) is the way we manage the agreement process between nodes. A key is required for an action to take place. That key is split into key shares with each voting node having a single unique share. Only once a certain number of shares (say 5 out of 7) have been received and aggregated can the signing key be generated.
Over the past couple weeks we have been tackling a few issues that occur during DKG in the process of the oldest adult nodes being promoted to elders. When a section decides an adult needs to be promoted a DKG round is run and the elder candidates generate a section key and a key share.
The DKG run is a stepwise process with six distinct phases: Initialisation, Contribution, Complaining, Justification, Commitment and Finalisation. Keys need to be generated, exchanged, aggregated and agreed upon. With messages passing to-and-fro at each phase. While total order of messages is not required for DKG in general, a node can only process messages relevant to its current DKG phase.
However, the asynchronous nature of the network means that messages can arrive in any order, and in a distributed setting it’s natural for some messages to come much later or even not at all due to network disruptions, for example.
These natural events should not affect the DKG ability of nodes, and much like other network operations, the solution to unordered DKG messages is (if you have not guessed it already) Anti-Entropy. AE actively updates actors with the information they need and prevents actions occurring until the participants are ready.
There are two distinct situations where AE for DKG messages is required.
DKG messages arrive out of phase
When a node receives a DKG message that’s part of a phase that it has not yet reached, it needs to hold on to that message and keep applying it until eventually (and hopefully) it has progressed to a point where the message can be applied.
Instead of leaving this up to chance, we can request that the sender of the DKG message sends us the list of messages that it has already processed. We can verify these messages using the sender’s signature and then apply them locally to get us up to the same phase and then apply the message that we’ve been holding on to. This allows nodes that have fallen back in the DKG process to catch up with the rest of the network.
To show why this is more efficient, consider the following.
Let’s assume the order of messages required is
1.1, 1.2, 1.3, 2.1, 2.2, 2.3...
where each message is
<phase>.<message_no>
For example, Initialisation.message1, Initialisation.message2, Contribution.message1 etc.
Say we have two nodes A and B. Node A is in phase 2 and node B is in phase 1.
There are two options:
Trial and error
# Step 1
B(phase 1) receives 2.1 -> Not ready -> store 2.1
# Step 2
B(phase 1) receives 1.2 -> apply 1.2
B(phase 1) -> applies 2.1 from storage -> Not ready
# Step 3
B(phase 1) receives 1.3 -> applies 1.3
B(phase 2) -> applies 2.1 form storage -> OK -> remove 2.1 from storage
Here the longer 1.2 and 1.3 take to arrive the more trial and error occurs.
AE
# Step 1
B(phase 1) receives 2.1 -> Not ready -> Asks A for all messages
B(phase 1) receives 1.1, 1.2, 1.3, 2.1 -> Applies them all -> OK
B(phase 2)
So AE is much more efficient and will reduce the number of unexpected message flows.
DKG messages arriving for a session that has not yet started
Before a DKG session starts, the participating nodes have to sign the list of participants along with the section chain to ensure that they are all participating in the same DKG session. The new key is added to the section chain so all participating nodes should agree on its length.
In the case where a node has not received sufficient signatures to start the DKG session, but the DKG messages have started coming in (for example, if connection issues arose during that initial phase) the node can then request for the aggregated signature of the DkgStart
message which can be verified against the section key after which the DKG session can be initialised and message(s) applied as above.
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!