As we mentioned a few weeks back, the old Sequence data type for mutable data was replaced by Register. Registers can do everything Sequence could in terms of storing previous versions with the added bonus that it can also handle concurrency. This week David Rusu lifts the lid on the mysterious world of Merkle Registers.
General progress
David Rusu has pretty much completed the basic functionality for the client writing the Spentbook. Once it’s fully in place it will make the Mint’s task read-only: the Mint will only need to check the Spentbook for DBC transaction commitments to verify that all inputs are committed to the same transaction and validate the transaction balances.
@Chris.connelly continues to wrestle with qp2p. Many of the jobs it does relating to incoming messages are probably obsolete now we no longer use the connection pool to keep connections open, and bringing connections and incoming message handling inside the safe_network repo should improve stability. It’s a bit of work though!
Chris also identified a long-standing bug in the qp2p tests, which is now fixed.
@lionel.faber is focused on CI/CD and launch tools and updated the automation tools to build on a cloud VM to launch a full-fledged testnet. Anyone can use this tool, provided they have the required credentials for AWS and Digital Ocean. The tool is configurable and can be triggered as required. Internally we are using manual triggers and a special launch-testnet
comment in PR reviews.
Work on the CLI and API continues to get them compatible with the latest safe_network version … it’s getting there, but as @Chriso mentioned on Tuesday there are still some gremlins gnawing cables, including some issues around file retrieval with the CLI.
We’ve been setting up some basic testnet log asserts, which should help ensure we’re not looping messages anymore (and indeed has already caught some minor message amplification). This is still not merged as we’re continuing to hack away at stability corner cases.
This week we’ve rejigged client retries to be more sane, which has helped; we’ve squashed some false errors occasionally being produced, which was due to us relying on tokio errors (when the actual task itself had completed fine). And @yogesh has started in on improving client bootstrapping to the network, which currently goes for any contactable node, but doesn’t attempt to establish the state of the network, which later leads to more AE-messages than are necessary.
Merkle Register
This week we’d like to do a deep dive on the Merkle Register, one of the two primitive datatypes in the Safe Network (the other being immutable chunks).
The Merkle Register is a novel CRDT we’ve come up with to model mutable state in the Safe Network, it supports concurrent writes and history traversal. They are used extensively in NRS to maintain name mappings, used to track versions of a file(s) and any app that needs mutable state.
The Register CRDT Family
Registers are used to write and read a value, think CPU registers. Existing Register CRDTs are the Multi-Value Register (MVReg) and the Last-Write-Wins Register (LWWReg). Our Merkle Register (MerkleReg) is yet another CRDT in this family.
All registers follow a similar API.
# You can read the current value(s) stored in the register
read(reg) -> val(s)
# You can write a value to a register using a causal context
write(reg, val, ctx) -> reg
# Merge two registers
merge(reg1, reg2) -> merged-reg
The MerkleReg is built on top of a Merkle DAG. The Merkle DAG is a generalisation on the Merkle Tree that relaxes nodes to have multiple parents and place data at each node.
With Merkle Trees, we arrange data at the leafs of a binary tree and hash them in pairs to get the next level of nodes, we repeat this by hashing pairs of these hashes until we reach the root. It’s an incredibly flexible data-structure, used in many decentralised systems. Merkle DAG’s remove the binary tree restriction. We can place data at each node now and each node may have multiple parents.
The way we build the MerkleReg is to use the roots of the Merkle DAG as the current register values. For example, consider the following series of operations.
Start off with reading the empty DAG, we get back the empty set.
read(∅) = {}
Write a
to the register, we use the empty set as the context.
∅
write(∅, a, {}) = |
a
Concurrently, another client writes to the same register.
∅
write(∅, b, {}) = |
b
Once the two clients push their state up, we can merge the two concurrent registers.
∅ ∅ ∅
merge(|, |) = / \
a b a b
Reading the merged register gives back both concurrent values.
∅
read( / \ ) = {a, b}
a b
We can now resolve the concurrent values by using both a
and b
as the write context.
∅
∅ / \
write( / \ , c, {a, b}) = a b
a b \ /
c
Reading the final register only returns c
.
∅
/ \
read( a b ) = {c}
\ /
c
Now a nice property of this MerkleReg compared to some of the other Register CRDTs is that we have the entire branching history of the register maintained in the Merkle DAG. If we are tracking a document, we can traverse its history to see how it evolved.
MultiSets and MultiMaps
We can build MultiSets and MultiMaps on top of Merkle Registers quite simply by taking advantage of how write contexts work.
Entries written to the register with an empty context will appear when we read the values back, this simulates a Set CRDT.
Start with the empty register.
read(∅) = {}
After writing y
with empty context.
∅
read( | ) = {y}
y
Then writing x
with empty context.
∅
read( / \ ) = {x, y}
x y
Writing z
with empty context.
∅
read( / | \ ) = {x, y, z}
x y z
Now we can simulate deletes by assigning a special “trash” node:
∅
read( / | \ ) = {x, y, 🗑}
x y 🗑
Merge y with to delete it.
∅
read( / | \ ) = {x, 🗑}
x y 🗑
\ /
🗑
That gives us a Set CRDT with inserts and removes. To get a Map CRDT we make the set members a 2-tuple key:value:
∅
read( / | \ ) = {x:1, y:2, 🗑}
x:1 y:2 🗑
Updating x
’s value is done by using the current x
entries as children:
Here we update x
to store 2
instead of 1
∅
read( / | \ ) = {x:2, y:2, 🗑}
x:1 y:2 🗑
|
x:2
Since this is a concurrently edited structure, we may end up with multiple values written for a key, (hence the multi in Multi-Map):
Another client concurrently writes x:3
∅
/ | \
read( x:1 x:3 🗑 ) = {x:2, x:3, 🗑}
|
x:2
These multiple entries can be resolved by writing back an entry for x
with both x:2
and x:3
as children.
∅
/ | \
read( x:1 x:3 🗑 ) = {x:3, 🗑}
| |
x:2 |
\ /
x:3
This gives us a MultiMap CRDT with history and merging of concurrent values!
Implications of the MerkleReg
The MerkleReg replaces the old Sequence datatype, we found that trying to model linear sequences in a partially ordered network was not going to have clean semantics. Instead we embrace forks, the result is that applications need to deal with concurrent values. For example, a browser built on the Safe Network must deal with the possibility a website has multiple concurrent index.html pages. NRS will have to deal with multiple entries for the same name.
This is not as big of a problem as you may think, most large scale systems are already working with this model. For example DNS allows for multiple IPs associated with a domain, and things (generally) still work.
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ;
German ;
Spanish ;
French;
Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!