Consensus 'Ordering'

We have been told in one of the updates that MaidSafe were working on reaching consensus and in particular were working on ‘consensus ordering’.

I was wondering about this and particularly about the following:

  1. What do you mean by consensus ordering?
  2. In a MaidSafe close group, are messages totally ordered? If so, how is this ordering currently determined?
  3. Can the MaidSafe network reach consensus across groups?
  4. Where do unreliable fault detectors / finite state machine replication fit into the picture?
11 Likes

Say we have four nodes in a section: A, B, C and D, and say that E is trying to join. The four nodes need to vote for E to join in order for that to actually happen. Let’s say that A sees that it voted first, and then it received votes from B, C and D. B first received a vote from D, then it voted itself, then it saw votes from C and A; etc. Everybody can see a different order of the votes - what we want is to have a way for all of them to agree on one shared order, which we call “consensus ordering”.
It’s not that important in this simple example, actually - but there are situations, usually having to do with splits and merges, in which such a shared order would be very helpful.

Not yet. Our current algorithms are attempting to ensure only that merges and splits are ordered with respect to other events (nodes joining/leaving), but don’t really give us any proper guarantees even about that. We are working on a much more robust solution.

It has to, about some things - especially about merges. But this is a step above the consensus inside a group/section, and having such a consensus first will be a huge help towards achieving inter-section consensus.

Algorithms for finite state machine replication are usually exactly the algorithms solving the consensus problem. Most of them don’t have the properties we need, unfortunately (they are either leader-based, or do not cater to the situations with malicious nodes). We specifically require a leaderless, Byzantine fault tolerant (which is another way of saying “resilient against malicious actors”) algorithm - which narrows the scope a lot, and the algorithms that remain all have some other drawbacks. We do have a few promising ideas, though, and that’s what our efforts are focused on recently.

28 Likes

Hey Bart, thanks for the informative reply.

Aha, so the consensus ordering is the total ordering of events within a group. It sounds like what you are describing here is verifiable secret sharing - if thats the case its interesting to hear that splits and merges in kademlia pose implementation challenges.

Interesting, what do you have in mind for a ‘much more robust solution’?

I’m curious to know why you need a leaderless solution. Would it not be feasible to go with a hybrid approach (e.g virtual synchrony within groups with a higher level of consensus joining them - at a guess) and research for a leaderless solution in the meantime? What are the effects of using a leaderless solution in this case?

5 Likes

Leader based solutions like pbft etc. have a serious issue, you can just DOS the leader one at a time, bringing the whole network down or if a leader is malicious it can over vote (send many fast messages). There are several issues. APBFT solutions do exist, like honey badger, hashgraph etc. These are OK but not perfect in terms of membership changes, honeybadger for instance is efficient in huge batches of transactions and hashgraph does not handle dynamic membership at all. We have several solutions now that are async BFT and also handle membership changes. We have 2 teams with 3 possible solutions right now. There is a few side issues as well.

The more robust solution is taking dynamic membership and extending it to handle merge and split as that is essential for sharding. In addition to that it must handle secure message transfer, of which we have several solutions in test/detailing right now.

The trick is to be aware of sharding, split, merge, secure message transfer and dynamic membership whilst maintaining the security of nodes and their trust levels (node age). CRUST helps a lot but has to handle secured message transfer (point to point) where routing handles this across groups (not connected). Then behind that we have data chains which are basically the ordered sequence of Blocks (events). To ensure their integrity ordering consensus is proving to be vital to us. You can think of ordering consensus as ordering of group consensus. So nodes can see different events at different times (they will) and the group consenus is happening in real time while the ordering part ensures that the group consenus is agreed. This means nodes need to “see” how all other nodes are seeing the events of the supermajority (>/2/3) of other nodes in the section. So we actually get the consensus of recent history whilst ordering the current events. A few rounds back is what gets agreed in order as the network moves forward (as it always does).

I hope this helps a little.

13 Likes

Their SDK certainly is as well as their launch plans, they do state they have a solution for permissionless, but as of yet it does not seem to be public.

For some things it is great, but to manage the membership of section it is not really. Membership changes cannot really be batched like that (in the thousands) as each change may alter the value of n (the population or valid voters) and the network or section may have to make changes on certain events (singular), such as merge or split.

These are the 3 solutions we have that all seem to work with varying degrees of completeness and efficiency.

As long as there are 2/3 valid votes that can be inferred in some way at a point in time(event time) then you have consensus, regardless of the remaining <1/3. If you mean when will something not get to consensus (not enough voters agree on the event/transaction) then there are several options, we were discussing some solutions today, such as after n events or similar arbitrary time, when the graph can purge safely. That purge point really means those non-consensus events never happened. You can also have coin toss events that can be network agreed to help progress, parts of honeybadger do this as well as hashgraph. In the solution today we had exactly such an event. (see this paper in relationship to binary consensus (coin toss type rounds as well) https://hal.inria.fr/hal-00944019/document )

12 Likes

@dirvine already answered most of the questions, but a few points seem to be left, so I’ll try answering them as well :wink:

A solution that has actual mathematical guarantees, that under certain assumptions properties like liveness (no stalling) and safety (all honest nodes agree on the same values) will hold. Alpha 2 code doesn’t have that - we patched all the bugs we found and it seems to work in practice, but we don’t have an actual proof. We are aiming to have that in Alpha 3.

You are close, but it’s not exactly that. We are trying to modify/extend existing solutions to work for our use case. As you noted, this is a hard problem, which is one of the reasons solving it is taking so long. We are fully aware of how hard it is and this is also why some of the approaches we consider don’t aim for full APBFT, but rather something weaker - a network that would be eventually consistent, so that nodes might disagree temporarily, but should arrive at the same final state eventually. This appears to be an easier problem, but also has its disadvantages regarding merges and splits (we do want to agree completely during such changes, or a lot of whole new issues arise).

Hope this clears things up a bit.

18 Likes