Data Chains documentation summary

Data Chains Summary

There was an immense amount of information contained in the latest Dev Update about Data Chains. The linked documentation contains a lot of implementation detail, and I felt a higher level overview would be useful.

These are the original sources

Goals

  • “improve the robustness and security of the SAFE network”

  • to achieve “an eventually consistency network” using group consensus

  • have a consistent network but still (necessarily) “handle out of order messages” for each individual node on the network.

  • “make sure that nodes agree 1) which changes to the section’s membership happened and 2) in which order they happened.”

  • resolve any differences between “The truth we see” and “The truth the group agrees”

  • To solve the problem that “the existing nodes in the network don’t always agree on which events [affecting the group] have happened yet or should happen”

New Benefits

Data chains bring some new benefits:

  • “ensure that all correctly-functioning nodes eventually agree on the nodes in their section”

  • “an external observer can use the chain to verify that a section is authentic”

    • When a client disconnects from the network and reconnects at a later time, the network it connected to is verifiably the same as the one it had been connected to before.
    • “A chain with missing events will be detectable in cases where multiple copies are requested from different current group members.”
  • “provide cryptographic proof of what data is stored on the network”

  • provide cryptographic proof of “who the legitimate members of the network are”

  • “no node needs to know about the entire network or chain” but the entire network can still retain integrity.

  • “On network restart we can accept a chain as containing valid information from a node”

How

  • Record changes to the structure of nodes in the network in a sequential order (ie a chain of events), so the history of the network structure can be known accurately. “The blocks are linked to each other by every block containing a hash of its parent block, thus forming a chain that can be cryptographically verified by any node.”

  • Inspired by the PBFT / Tangaroa consensus algorithms.

  • The priorities (in order of most to least important) are to protect the network, protect your group, protect yourself. “These basic tenants will mean a node will self terminate to protect the group, or destroy the group to protect the network.”

Chain Storage

  • “this first part does not yet concern data and focuses on the changes of membership and the network’s structure only”. Possible events that may be included in data chains are:

    • Genesis, the initial network state [no parent]
    • Add, recording the addition of a node
    • Remove, recording the removal of a node
    • BeginMerge, recording the final state of a section before merge
    • CompleteMerge, recording the (re)creation of a new section [two parents]
    • BeginSplit, recording the final state of a section before split [allows two child blocks]
    • CompleteSplit, recording the (re)creation of a new section
    • BeginForceMerge see Handling Broken Sections below
    • CompleteForceMerge [two parents]
  • “Routing nodes must store [a] subset of the chain in their cache” - this increases the resources required to run a routing node (compared with not using data chains), however “crude analysis suggests the storage size is not prohibitive, though transfer size on startup likely would be”

  • The amount of recent chain history to be stored by each node “needs to be large enough to make the probability of nodes falling behind by more than that number of blocks negligible, without creating disproportional overhead. For now, we estimate that 20 should work well.”

  • “Where prioritisation of multiple valid blocks is required [at the same time], blocks are ordered … from most urgent to least urgent with the goal of stabilising the network as soon as possible”. The order of events from most to least urgent is BeginMerge, Remove, Add, BeginSplit.

  • “blocks are proposed by a particular member called the primary, and are only committed to the chain if agreed on by a quorum of members”. This means the section nodes all have a consistent view of the order of events, regardless of the order each individual event arrives at each individual node. “Until the group (or specifically) quorum of group members agree with their vote then the event is not agreed.” However it is also said that “any node can make any proposal [for a new block] at any time and send it to all group members” so there seems to be some mild conflict here between proposed design and implementation.

  • Nearby sections are also monitored by the data chain - “Blocks from neighbouring sections are added to our chain if valid in the same way as those from our own section, and if valid trigger updates to the corresponding sections in our routing table.”

    • “It is not essential to ensure that section updates reach every neighbouring node… the processes are designed to tolerate small discrepancies in the version of neighbouring sections.”
    • “Other updates from neighbouring sections (to add, remove or split) could trigger a split or merge in our own section, and may cause us to send connection requests or disconnect from nodes, but otherwise do not directly affect our section.”

Error Handling

  • “Several constants are needed for broken section handling”
  • “Determining optimal values for these parameters will be done experimentally.”
  • If good health is not indicated by a neighbouring section within the specific heartbeat period, that neighbour is considered to be broken.
  • “we establish which nodes of [the broken section] are still reachable” and attempt to merge with those reachable nodes to form a new section.

Attacks

  • “In order to commit a fraudulent BeginForceMerge, the section p1 must have enough malicious nodes to satisfy quorum, thus these nodes can commit a CompleteForceMerge with any list of “still reachable” nodes they want. Since the compromised section can commit any Add block in the same way, an attacker with a single compromised section could add enough malicious nodes to satisfy the workload and use force merges to take over neighbouring sections, dropping enough other nodes to maintain a compromised quorum.”

  • “Security of the network depends on each section behaving correctly, which is ensured by the network distributing nodes throughout the network automatically, such that an attacker cannot choose where in the network any nodes he runs will join, and a brute force attack would require running a significant proportion of the nodes in the network to have any real chance of success… The broken section recovery mechanism could be used as a weapon allowing a section controlled by an attacker to take over more of the network, even up to the point of taking over the entire network if the attacker had enough resources to handle network load; this is a known limitation, but necessarily any mechanism to recover from a state where no consensus is possible must reduce security. This does not necessarily mean the system is less secure than a network using a less strong form of consensus, and is only an exploit if an attacker can control one section in the first place.”

  • “the extra message handling required to establish consensus on each new block may add some delay. This is not seen as a major concern because… the work involved in handling client requests is expected to be far greater than the work involved in managing churn”

  • “As the vote is signed then nodes who are acting maliciously have little effect, however their actions are recorded by default and these actions are digitally signed. Therefor bad or invalid behaviour is detectable and cryptographically provable. This detection and any punishment is not discussed in this paper.”

  • “A critical question is what to do with misbehaving nodes. These are nodes that do not vote, or frequently propose (vote) for events that we do not agree with. To maintain integrity bad nodes will require to be punished and possibly removed from a group. If the punishment (such as Kill the node or immediately relocate with age/2) is too harsh it would de-stabilise the group quickly, too lenient and it will lead to too high an error rate and affect our decision making. To penalise or at least handle a mis-behaving node, we need to measure it against an agreed behaviour. As we know nodes will recieve messages and events out of order we do need to handle the case where a node is trying to keep up, but maybe slightly slow. This slow node may be valuable, but if too slow, then it’s a danger.”

Difference Between Option A and B

I found it hard to differentiate the features of A vs B, let alone the pros or cons. Anyone who can shed more light on this would be appreciated.

  • “Option B proposes a mechanism that computes the validity of blocks from a set of votes that is independent of the votes’ order.”

  • In Option B “more than one state could be valid at the same time”. Events are proposed and gradually voted on, until they go from being ‘proposed’ to ‘confirmed’ (although the terminology used in the document is to go from being ‘current’ to ‘accumulated’).

  • Presumably the above points differ from Option A in that for Option A only one block may be proposed and voted at a time, making it a more sequential process. Some additional clarification on this would be welcome.

Personal Observations

  • I don’t understand how the entire chain of blocks is stored. The use of MIN_CHAIN_DEPTH seems to indicate that most nodes will store a subset of the chain that does not extend back to the genesis block. Some clarification here may be helpful to me.

  • Initially the use of the word “block” seemed misleading to me. “A block is an element in the chain, describing one change.” - why not call it a change or an event? That’s what it is after all. Later in the documents a block is defined as “an agreement of several nodes. This agreement is digitally signed by each node.” This makes more sense to be labelled as a block (ie a block of signatures). I feel the use of the word ‘block’ too easily invites comparison with blockchains, which is a great disservice to the technology of data chains. I would suggest the word ‘change’ or ‘event’ or ‘delta’ may be more appropriate. A minor point for sure.

  • I’m not convinced the work involved in handling client requests will be “far greater” than the work involved in managing churn. Churn is an obvious attack vector and I suspect churn efficiency will be a major factor in network security. This is just a gut feeling of course, and I concede the benefit gained by such an attack would be so small it would probably be infeasible.

  • The language used in Option B is much more technical and detailed than the other two documents. It was a very difficult read for me. I think Option B is a more natural design for the voting mechanism (using a ‘frontier’ of possible chain extensions rather than a strictly serial / sequential vote system via a Primary node). But I think this is also much more complex to design and develop. So my preference is to implement the simplest thing that works (Option A) with a mind to be able to improve it later (turn Option A into Option B via a series of future optimisations). Maybe this indicates I’ve misunderstood the difference between the two options!


The documentation represents an enormous amount of work and is extremely impressive. A very heartfelt congratulations to the people who created it. I feel this design and documentation work is as important (and difficult) as implementing the product features and code and the amount of progress it represents should not be underestimated.

I’m new to these ideas so any corrections are more than welcome!

If you have your own summary I would be very interested to read it. Data Chains present complex ideas that benefit from being explained at a few different levels of abstraction.

38 Likes

Thanks for the summary @mav. I found myself asking a lot of the same questions. I appreciate the effort you’ve put in here to condense these mighty docs.

7 Likes

Initially we may keep everything back to the genesis block, but eventually it should be possible for nodes to keep say, just the last month’s worth of history. People wanting to spin up a node then just have to get a block hash from within the last month from a trusted source (a friend, the forum, MaidSafe devs, etc).

I don’t particularly mind the comparison with a blockchain, the data chain described in the Option A proposal is blockchain-inspired, and the fundamental problem being solved is essentially identical: distributed consensus on mutable data. Initially, we’re aiming to get consensus on the prefix => section members mapping, but eventually blocks might be extended to include the hashes of data changes (much like hashes of transactions in a Bitcoin-style blockchain).

Option B is closer to the original data chains RFC, that’s true. We’re currently in the process of trying to answer some critical questions about it, including:

  • Is it possible for a node to be part of multiple divergent sections for long periods of time? Can we upper bound the time that sections can remain divergent?
  • Can malicious nodes force divergence through difficult-to-detect actions? i.e. actions that we can’t create detection rules for.
  • What should we do about data when the state of the section is diverged and we’re not sure who the online nodes are?

We’re currently working on some simulation code that should help us check the soundness of the rules and the limits on divergence. It’s almost ready to start yielding some results! GitHub - michaelsproul/ewok: Simulator for a distributed fuzzy ordering algorithm

Once that’s done we’ll be in a better position to make a choice, although personally I am currently erring on the side of Option A.

15 Likes

I’m with @mav on terminology. People will tend to think this is a blockchain anyway, or to try and compare it because that’s what they know or have heard of etc. For us it doesn’t matter, but from a perception point of view I think it is important that as many people as possible realise SAFEnetwork is not yet another thing built around blockchain tech because if they do, they won’t realise it has overcome the many drawbacks of blockchains. SAFE will get dismissed to easily as just one more rather than something needing to look into.

So we might want to consider other terms than blocks. I’m fine with @mav’s suggestions. Others might try to pick up on the group concensus aspect: groupchoice, votestamp, signedpoll, teller record or some such? Really not thought about it but maybe someone can refine? :slight_smile:

6 Likes

The SafeNet SUPER-LEDGER!! lol :slight_smile: … or Sledger for short :stuck_out_tongue:

3 Likes

Thanks for the topic, and I agree with you on this part. I find it a bit confusing as well, let’s look at what it does:

So this event removes a node from a group and quorum of the other nodes in the group need to agree. So actually it looks like:

  • Event: Remove a node
  • Signed by: Quorum of the group_nodes

So if you store this data in a chain why not call it an “Event” with “Event_Signs” added to it? And why not call it the “Event_Chain” and completely leave out the word block or blocks? Datachains part 1 isn’t about storage in some merkle_tree like system, it’s about the logic and security of nodes forming groups.

Later on in development the word Datachain/Storagechain could be used when it’s about storage.

Just my 2 cents.

6 Likes

Along the same lines as OP, here are some thoughts I had to better get my head around the concept:

There are really two fundamental data structures defined in the Datachain proposals.

  • The structure used to track and validate section membership. I’ll call this the section-chain.
  • The structure that tracks and validates information about the data stored in the network. I’ll call this the info-chain.

These structures perform very different functions and have very different requirements. The current discussions (option A and B) about datachains are discussing primarily the section-chain.

#section-chain
This is the root of trust for the entire network. It defines the nodes that make up a section at any given moment in time. A more accurate name for the entire structure would be “section-graph” [1]. It is a complex network of inter-related blocks, which define section membership events in the system. However, if you take any given leaf block in the graph [2], you can create a “section-chain” by following its link to its parent block all the way back to the genesis block.

You can validate a given block is valid using only its section-chain. You don’t need the entire graph. As a result, any node that needs to validate the membership of a section [3] only needs the relatively small section-chain for that section. No node ever needs the entire section-graph, only the section-chains of the nodes it communicates with. [4]
In other words, this structure can scale in ways that a classic block-chain cannot.

There is a lot of subtlety here, but the basic idea is that scaling anything of this type (datachains, blockchains, etc) is always a tradeoff between security and performance. Bitcoin uses 100% of the security available for every piece of data in the blockchain. As a result, scaling above a certain limit is effectively impossible. This is why you hear talk about “off chain” transactions.
With datachains, the network dynamically distributes security evenly across all sections. As the network grows, the total available security and total available performance grow with it. The sections split to maintain a constant level of security and performance in each section. [5]

#info-chain
This structure protects the integrity of some amount of data. It hasn’t been completely specified, but it is probably an actual chain (a one to one relationship from parent to child). It defines the state of the data at any given point in time.
An info-chain references a section-chain throughout the chain. A section-chain has no knowledge of info-chains and will never reference one.
The info-chain uses the section chain as validation. Its proof of validity is directly tied to the validity of the section-chain it references. When an info-chain references a section-chain, it is using it as proof that the nodes signing the info-chain block have the authority to validate and sign that block.

So, you can think of the info-chain as riding on top of the section-chain. The section-chain is a lower level piece of infrastructure that the info-chain is depending upon.

In order to validate a block in the data chain, you only need the latest subset of that chain, back to where it references a validated section-chain block. Only nodes involved in managing a piece of data need the info-chain for that data. This will probably be a subset of nodes within a section.
So, therefore, info-chains can scale to a greater degree than even section-chains. Only a small set of nodes need to track and manage any given info-chain because it is using the more widely distributed section-chain for its root of trust.

The subtile genius of the system is that you can scale the security of the data on an info-chain by info-chain basis. An info-chain protecting safecoin may be more secure, by requiring more signatures, than an info-chain protecting the integrity of some random mutable data. So we get to choose the security/performance tradeoff for every piece of data managed in the network.

[1] The technical term is a DAG. You can also think of it as a Tree, if it helps you grok what is going on, but this isn’t strictly accurate.

[2] A leaf is any block that is not the parent of any other block. The latest, most current block in a chain.

[3] This is probably only nodes directly connected to a section (as dictated by the routing layer).

[4] This is complicated a bit. There are optimizations that may be desired, for instance to allow us throwing away very old blocks. These optimizations may need to see a larger part of the section-graph, or even all of it.

[5] This is really only true of very large mature networks where performance and security both grow proportional to network size.

19 Likes

So this (info chain) is the Datachain devs agreed upon? and is what enables the Public/ Private ledger feature?

1 Like

Looking forward to an Index/Summary of what we now have with Data Chains. It’s probably the missing ingredient to gaining some much deserved development buzz from the wider Dev space.

1 Like

Ah yes, at the end of Davids article answers my questions about further explanantions:

We think there will be many posts depicting various aspects of this design pattern over the next while as we get the ability to explain what it offers in many areas. (i.e. when we understand it well enough to explain it easily)

6 Likes