Yes, you are correct @Nigel Parsec was very much like hashgraph. It ordered opaque events (unlike hashgraph which orders transactions and can see them). Parsec had a go at permissionless, i.e. churn handling and it was thought “oh that’s easy” no need to prove that part of the algorithm, in fact, it was a small appendix to the paper, however, this was seriously wrong IMO.
The issue with total order consensus and the “one size fits all” approach to consensus, is the Safe network has a lot of moving parts and different data types in a permissionless network. So Parsec etc. had no care of churn handling and required all network events were serialised and ordered one by one, hashgraph would too and I suspect that is why they stick with permission.
In terms of consensus, there is much confusion. I prefer to call it agreement as we are trying to get some nodes to agree and if we trust those nodes when they are in agreement then we are good. I also prefer to call quorum a majority. So we have a group size and a majority agree and that is fine. A quorum just means the min number of required voters and that is wrong, it’s no use to us. WE care about whatever majority agreements were, not only how many voters there were.
So further down we go, a single consensus algorithm in a data-driven network will not work. Also in a permissionless network controlling/ordering churn will not work.
With that we have a few agreement algorithms in play.
- AT2 for money transfers
- BRB (AT2 derivative) for a single node or client-initiated events (i.e. update data/store data)
- We use a mechanism of anti-entropy with a supermajority to handle churn.
“3.” Means we look for agreement a churn happened, we can have many at once. When there is more than 1 we have nodes still sign the “conflicting” agreement and then resolve the conflict in real time. If it’s 2 nodes leaving at once we can apply both safely. If it’s a node join and node leave we can potentially handle that. If it’s a node join/node leave/ same node join we can resolve that by refusing that operation and so on. This will be formalised soon, we have a few write-ups, but have focussed on stress testing this one and it works really well for us.
Then we have “lost majority” recovery. This is being implemented now, but not required for our testnet. That effectively means the network can lose consensus/majority and recover from it.
Much of this is based on a concept we called data chains and has been advanced to a SectionChain and NetworkAuthority (so we keep the agreement signatures on the data and can look them up in history to prove the action was network agreed). That notion was first introduced in 2015 but the team at that time could not deal with handling concurrent events and real world changes and instead focused on a mechanism to control the world and that was parsec.
Our current toolbag does control small elements mostly money (that weird human concept of data handling). So where we need total order we do apply it via AT2/BRB but limit this concretely as trying to total order events outside our control is, well in my opinion, not clever
Hope this helps.