I heard you guys are hashgraph like

I heard Safe Network used a consensus algorithm somewhat similar to Hashgraph’s from a comment on this reddit post

If that’s true, I’m interested in how you guys solved the problem of making it public. Is it still a permissioned network? Thanks for your technical expertise!

12 Likes

Welcome to the forum!

Guessing this references parsec for total order consensus. Maidsafe moved away from that, still use gossip in some capacity but data and network changes are guaranteed eventually consistent by utilizing CRDT’s in novel ways. You are right that this is a totally permissionless network.

I’m no expert but what I can grasp is extremely interesting stuff that I don’t see much of elsewhere, especially not the total network design.

Maybe I could tap @dirvine for a solid answer when he has a free mo. I would also check out the Weekly Update and the Safe Network Primer which are excellent resources that you can gain a lot from.

20 Likes

Yes, you are correct @Nigel Parsec was very much like hashgraph. It ordered opaque events (unlike hashgraph which orders transactions and can see them). Parsec had a go at permissionless, i.e. churn handling and it was thought “oh that’s easy” no need to prove that part of the algorithm, in fact, it was a small appendix to the paper, however, this was seriously wrong IMO.

The issue with total order consensus and the “one size fits all” approach to consensus, is the Safe network has a lot of moving parts and different data types in a permissionless network. So Parsec etc. had no care of churn handling and required all network events were serialised and ordered one by one, hashgraph would too and I suspect that is why they stick with permission.

In terms of consensus, there is much confusion. I prefer to call it agreement as we are trying to get some nodes to agree and if we trust those nodes when they are in agreement then we are good. I also prefer to call quorum a majority. So we have a group size and a majority agree and that is fine. A quorum just means the min number of required voters and that is wrong, it’s no use to us. WE care about whatever majority agreements were, not only how many voters there were.

So further down we go, a single consensus algorithm in a data-driven network will not work. Also in a permissionless network controlling/ordering churn will not work.

With that we have a few agreement algorithms in play.

  1. AT2 for money transfers
  2. BRB (AT2 derivative) for a single node or client-initiated events (i.e. update data/store data)
  3. We use a mechanism of anti-entropy with a supermajority to handle churn.

“3.” Means we look for agreement a churn happened, we can have many at once. When there is more than 1 we have nodes still sign the “conflicting” agreement and then resolve the conflict in real time. If it’s 2 nodes leaving at once we can apply both safely. If it’s a node join and node leave we can potentially handle that. If it’s a node join/node leave/ same node join we can resolve that by refusing that operation and so on. This will be formalised soon, we have a few write-ups, but have focussed on stress testing this one and it works really well for us.

Then we have “lost majority” recovery. This is being implemented now, but not required for our testnet. That effectively means the network can lose consensus/majority and recover from it.

Much of this is based on a concept we called data chains and has been advanced to a SectionChain and NetworkAuthority (so we keep the agreement signatures on the data and can look them up in history to prove the action was network agreed). That notion was first introduced in 2015 but the team at that time could not deal with handling concurrent events and real world changes and instead focused on a mechanism to control the world and that was parsec.

Our current toolbag does control small elements mostly money (that weird human concept of data handling). So where we need total order we do apply it via AT2/BRB but limit this concretely as trying to total order events outside our control is, well in my opinion, not clever :wink:

Hope this helps.

27 Likes

I’ve been looking for articles on anti-entropy, I’ll read this when I have time.

Below is first part of the introduction in [1].
“Anti-entropy, or gossip, is an attractive way of replicating state that does not have strong consistency requirements [2]. With few limitations, updates spread in expected time that grows logarithmic in the number of participating hosts, even in the face of host failures and message loss. The behavior of update propagation is easily modeled with well-known epidemic analysis techniques. As a result, many distributed applications use gossip to contain various inconsistencies.”

[1]. “Efficient Reconciliation and Flow Control
for Anti-Entropy Protocols” (https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf)
[2]. “Epidemic Algorithms for Replicated Database
Maintenance " (http://bitsavers.trailing-edge.com/pdf/xerox/parc/techReports/CSL-89-1_Epidemic_Algorithms_for_Replicated_Database_Maintenance.pdf

5 Likes

I didn’t know AE was just gossip! Seems a draft name change because gossip is a really good summary of how it works, AE doesn’t carry that for me.

2 Likes

Gossip uses an Ae pattern for distributing data where it can update peers on what they don’t yet have. We use an Ae pattern more like CRDT ops, but it’s very similar.

Basis is

A -> B == Here is data ranging from X->Y
B -> A == here is more data ranging from Y->Z

Then

C -> A == I think the tip of the data is W
A-> C == Ah you are behind, here is X->Z

For us it’s more like (A and B are sections - recodnised by section key)

Some node in A → some node in B == Here is a message and I think you are B
iff B has churned
Here is your message back, you need to try again, I am not in B anymore I am in C and here are the new members of C and a signature of section B to validate key C and the Elders

So it’s a similar thing where nodes talk to each other to update themselves and each other on the present state of any data that is continuously changing. This is the anti-entropy part, if left alone entropy takes over and everything diverges. So anti-entropy is a kinda loose version of what we would call sync, while realising in a concurrent and changing network (or in actual life) you are never in sync or current, just close to it, but never there :slight_smile: .

15 Likes

Thanks David, is good to maintain at least the gist of all this.

6 Likes