Consensus, async, Ae : What does it all mean and where does it matter?

We hear “consensus” mentioned a lot, especially in cryptocurrencies. While Safe includes a cryptocurrency, we don’t use a consensus algorithm for that (we only need agreement). What that means is we have the client “get the agreement” and pass the agreement back to the network. It’s very clever, simple and correct.

So what is consensus?

I find it confusing as many people really mean “agreement on order” of events, while others may mean purely agreement on thee validity of events.

So immediately, there is confusion and valid confusion. I prefer to use the full phrase

  • Agreement on order of valid events
  • Agreement on the validity of events

To further confuse, the events we are trying to get agreement on really matter. There is an unobvious point to be made here.

When trying to get agreement on an order

  1. A consensus algorithm can only agree on an order on things that are valid in any order.

If we need a system to order events then that implies events are not ordered themselves (i.e. crdt data, counters etc. are all implicitly taking care of order). So an agreement protocol is required when there may be concurrent changes to a state and those changes require order.

For the algorithm to decide on the order then each of the concurrent elements requires that they are validly next in order (if they were not then we should not consider them in the first place). Given that 2 elements X & Y are being ordered then X or Y MUST be valid as the next event.

  1. If ordering 1 item invalidates another, then agreeing on the first should invalidate the second from being considered as a valid input to consensus again.

If some nodes have voted on X, but Y was selected as the next valid value then we have X with already signed votes. This is not good as these signed votes are invalid (they were not chosen). However being able to use these votes in the next round of choosing X or Z for instance then we already have votes for Z, BUT the nodes that voted for X previously may now wish to vote for Z, but they cannot as it would be a double vote and that should mean death for the node.

i.e.
Imagine these were bitcoin transactions. X has xfer from e->f for £1000 and Y has a xfer e->j for the same £1000. So when we select Y we must make sure X is invalid (not signed) and vice versa.

Or you have a new node joining, X or Y but we only want 1 of them. Again selecting X means Y should be invalid as we wanted 1. However in this case we may want Y ti join later, but we don’t want him already having votes he can join.

So it’s really when 1 event invalidates the other, even for a time period (that we cannot determine)

Therefore, we are forced to address this situation, with generation numbers or some other form of invalidating old votes.

When trying to get agreement on event validity

This is the realm of purely deciding on an event being valid. This is generally when we are collecting signatures from nodes without care of the order of what else is being signed.

In the Safe network, we say an event is valid when it is network valid.

e.g. storing data, and spending DBCs use this type of agreement/consensus on the validity of the action being taken.

This also has a very important issue to be certain of, and that is

When an event is deemed valid it must remain valid for the life of the network (we will come back to this)

Consensus algorithm fails

Now we can see consensus algorithms are required to order things that do not have their own implicit order, but where we require that order. To run this algorithm we must check:

  • The event is valid

Wow, that sounds simple! All we need is a black box algorithm and fire events in it, and we will get them all ordered for us.

WRONG!

If the events are part of a system with its own order (say a counter), then we CANNOT run these through consensus algorithms as this will give us an issue. Here is an example

We have concurrent events stating the next value. So let’s see what can happen

Consensus algorithm orders → 4 2 3 7 5 6
Counter orders → 2 3 4 5 6 7

We can see the “source of truth” that is consensus just lied to us. The whole thing is a bust! (remember PARSEC, this was its massive fail).

So what we must check is:

  • The event is valid
  • The event is not part of an ordered system already

Consensus algorithm basics

We hear a load of 2f+1, 3f+1, 2/3 or 1/3 things flying around. These confuse as they are never really described well and simply. It is very simple.

First, we need to decide how many BAD nodes we can have. The math tells us that number is up to 1/3 of the nodes. WHY?

If we consider 7 nodes, the max number of bad nodes (which we call f for faulty) is 2.

The reason is as follows

If we consider a vote is valid then it requires 5 (2f+1) nodes to vote. If we analyse that it means
2 can be bad, but at least 3 are good (always have a minority of bad nodes).

i.e. the majority of nodes voting MUST be good nodes. The attacker has an influence less than the majority. This prevents attackers from altering decisions. This is where 2f+1 comes from. It forces all decisions to contain a MAJORITY of good nodes.

Consider though, the rules of consensus above. All events are valid, all we want consensus algorithms to do is agree on order. So why does bad nodes altering the decision matter? (spoiler, it makes no difference)

The answer is not that they can alter order (weirdly, that is not a problem). It’s that they can create forks, by signing more than one vote. i.e. they vote for X and Y and both are then “agreed” as valid. That is an issue as we have a fork. So consensus agrees on the order in a manner that prevents forks (that’s all). For this reason, we cannot have a majority vote that is BAD nodes as they can double vote (mind you, we could detect that and kill those nodes). i.e. we can resolve the fork in many cases, but while the fork exists, there can be trouble (doublespend is the classic example).

Can we add CFT to BFT for increased security and liveness?

It looks like we can, in circumstances where we select >3f+1 nodes but <=4f+1. Let me explain:

I was under the impression (Byzantine Fault Tolerance) BFT included (Crash Fault Tolerance) CFT as BFT is harder to combat.

So we have a tolerance to <1/3 BFT nodes as we claim (as does all algorithms in the space). However, we can maybe do better for safety and liveness.

So normally, the population N is set as 3f+1 (we actually set our threshold crypto to the same value of required sigs which is 2f+1)

This means tolerating 2 byzantine failures (f==2) we have a group size of 3f+1 (7) and a supermajority of 2f+1 (5).

However, this is not the whole story. We can set the group size to 4f+1 and still have f_bft==2 and f is still <1/3 so we are still using the same assumptions on safety.

So the difference from us is 9 nodes instead of 7. What do the extra 2 nodes mean and what do they by?

Meaning

These are 2 additional failure nodes, but they are not byzantine. They are CFT fails. The difference being a CFT failure is a crash, so a non-participating node. It cannot do byzantine stuff like double vote etc. It is purely a node that does nothing (it crashed as far as this round of voting goes).

i.e. We have 2 fail types f_bft and f_cft. If we set N==3f+1 there is no room for f_cft nodes. If we set N==4f+1 we have room for f f_cft nodes IN ADDITION to the f f_bft nodes.

Buys us

It buys us additional safety. Our supermajority is now a straight majority (only at N==4f+1). So we still have 2f+1(5) nodes required to agree, but we can have up to 4 failures instead of 2.

The trick/sneaky part is these failures can only contain 2 BFT failures. So it can be 4 CFT or 3CFT+1BFT or 2CFT+2BFT, but only when we use 4f+1 as the population.

So that extra f is where we can cope with CFT nodes existing.

It means we buy ourselves safety and liveness, but there is a cost. If all nodes do participate, then there is a chance of many more messages, but they should limit as any node seeing 2f+1 votes, knowing there is no need to keep voting. They can distribute the 2f+1 votes to all members as a finality clause. So this terminates the round and possibly faster? we can not really tell without measuring, but the increased membership does not strictly mean we gorm from !7 messages to !9

i.e. It looks like we have stumbled upon a non-documented (as far as we can tell) fact, that CFT can be added on top of BFT to allow greater safety and CFT does not require byzantine defences.

Bottom line is that I think this is a winner all round for us.

Note: a crashed node, or what we call a f_cft node is an honest node. It just failed to participate, it did not do damage. A f_bft node is a dishonest node and will try and do damage.

Back to events (the critical part) MEMBERSHIP

So we have events, stuff happening and we need to decide on them being valid. Let’s dig into validity and what that means and when?

We can say an event is valid if:

  1. Our internal state shows the event is valid (we could be wrong)
  2. The event is signed by a supermajority of nodes (elders in our parlance)
  3. Some external oracle leads us to believe an event is valid (i.e. node lost) (again we could be wrong)

2 is easy as we can say that event is network valid i.e. it has network authority.

However, 1 and 3 are Very tricky. (Can we even put them through a consensus algorithm without knowing they are valid?)

Neither 1 or 3 are network valid and in Safe, that means we don’t trust it. So what do we instinctively do? We fire it into a consensus algorithm BUT WE CANNOT DO THAT. We are not trying to get an order of the event at this time, we are just trying to get “agreement on the event’s validity”. (We cannot feed a consensus algorithm invalid events.)

So herein lies my issue with membership and consensus as a solution. It’s not!

I am not saying that consensus will not order events, it will, but IT MUST ONLY ORDER VALID EVENTS. So when is an event valid? In our network we trust no single node, so 1 & 3 above are merely opinions, but not valid events.

To go from an opinion to a valid event, we must get network authority. This means getting an agreement the event is valid, regardless of its order!

For us, that means we gather a section signature on the event.

Does order then matter for membership?

Well, it can. We can always find ways that unordered events are better ordered, but should we?

If we have a bunch of valid events like nodes lost, then it does not matter the order. It has happened, the nodes are not there!

If we have decisions to be made on membership then it’s more difficult. Some decisions such as

  1. Deciding we need a new node
  2. Deciding we can split the section
  3. Deciding where to store/get data from

These cause us problems and I would love us to consider each of these decisions that we take on the “state” of the membership. For instance, do slightly different views on membership really matter that much? If we have Ae to constantly sync nodes and clients then we know that changes will be passed around each node in a very short time. Consensus algorithms take time anyway, it’s a lot to think about, but it’s not a simple thing, but it’s not dead hard either.

This is a huge post so far and I am gonna stop here for now, but I will continue this to the bitter end. I am super keen we get this right. So posting now, but will continue soon. I am keen on thoughts, ideas, debates etc. to get this right.

35 Likes

Super interesting, even though I cannot say I understood it all. Thanks for putting this up!

8 Likes

I do not understand almost anything, but it does not matter, my admiration for the hard work

7 Likes

So I want to try to think a little bit:

Case “Node lost”. (“I” refers to an elder here)

  1. When do I as an elder propose an adult is lost? Let’s say when I don’t get a reply after 10 retries.
  2. After 10 unsuccessful retries I’ll tell others I see the node as “lost”. (How? via AE?)
  3. Others try to connect to the node I propose as lost.
  4. Five of them can connect to it. One other can’t. (For whatever reason.) (Can it happen in reality, that nodes are lost to some elders, but not others? Let’s just assume for a while it can.)
  5. There is majority agreement the node is online, so no changes in membership.
  6. Am I, and the other non-connecting node now satisfied with how others see the node, if we are not able to connect to it ourseves? If so, for how long does our satisfaction last? Or are others going to kick us out of SAP for not agreeing with them, even though we honestly can’t reach the node? Or are we just repeating our view?
  7. What if 2 honest elders just can’t connect to a node? Can one dishonest elder then tip the scale to kick out the partially unresponsive node, by falsely claiming it can’t connect to it, even though in reality it connects just fine?
  8. Can an honest disconnect be told apart from dishonest? Is there a danger of an (dishonest) adult being selective in which elder it replies, in order to make the elder seem dishonest? It seems we can’t kick an elder out for simply disagreeing about the connectivity of an adult. But still, elder could be dishonest about this?

There is no problem, if the assumption: “adults can (honestly) lose connection to some of the elders while connecting to others” is not valid.

Ok, I was just trying my hands on the problem, nothing more. Anyone else? :fist_right: :fist_left:

7 Likes

Super explanation David. I usually struggle to understand your deep dive posts but this one makes sense and I can’t wait for the next installment. Very clever to leave us with a cliffhanger!

14 Likes

Good read. Thank you for explaining this and the BFT/CFT

I always wonder what will happen when we get too many nodes that are bad. Will the section be able to recover?

I would think from a control systems point of view that the network/section can tolerate it to a degree. It seems initially to me that once it gets too large it will have a similar effect as entering positive feedback region in a control system and the network ends up sending the messages, but the reality is changing too fast to keep up.

8 Likes

This is key Rob, yer right. Regardless of algorithm, there is always a delay. We can ignore the delay (mathematically correct algorithms tend to do this) or we find ways to work with it. The direction I favour is the latter and hopefully, we are getting there.

Here the delay is interesting as the state we are all agreeing on is the infrastructure itself and it’s the infrastructure that is changing. I love it all.

12 Likes

I am guessing that the talk on adding CFT nodes (elders) and how once a decision has its required majority the extra nodes then just relay the messages will help in reducing transaction time for that decision. With a lower delay in the system to have all nodes know about each decision then the network can tolerate it even more. (Reduced dampening time wise)

I would think that when there are too many node joining decisions to be made and the nodes cannot keep up then rejection of the new node join request is a viable solution. Not a permanent rejection, just a “Not Now” state. One would assume a correctly operating network will only be overloaded when too many decisions at that time are needed to be made, thus offloading (to a later time) node joining is one way to help.

5 Likes

This is where we have dysfunctional checks. So it’s lost when it has not done at least 50% of the work it’s neighbors have. This wy we don’t use timers and such. So the whole network can pause and nothing changes.

You would propose it’s lost by signing a message NODE_LOST(node_name)

They can try that but also check their dysfunctional score for it.

YEs, you need 5 nodes to agree it’s dead, then it is as it has Network_Validity and that is the ultimate truth. So you still see it and think it’s OK but you get a network valid message saying it’s dead and it is dead to you too.

In this case it may be you that is gonna get kicked off :slight_smile:

This is where we have the 2f+1 where the majority of votes is always good nodes. If there were 3 good nodes only that thought he was offline then the 2 byzantine nodes can make that real by signing the vote.

No we cannot tell. So a node behaves and does ti’s job or not. If not then others see the dysfunctional score going up and he is out.

The other thing an elder can do is sign bad data or a bad message. That is irrefutable proof he was bad and nodes just need to send that to all other elders and he is out. Remember with a network of 100 sections of 60 nodes each, to be an elder with a 5min churn rate would take centuries.

4 Likes

I am working on part II where I hope to show each joining node joins in relation to an event. So we can have many concurrent such events and not worry.

6 Likes

Haha, I “knew” I would be presenting pseudo-problems. :wink:

I’ll stop wasting your time with my reasonings, as I don’t have enough knowledge base for my reasoning to be valid (even though it might be intrinsically correct). Or at least I’ll try hold myself back… I may be back later though, if I cannot resist…

Thanks!

4 Likes

No worries at all. Those questions are what we call the business logic part. So consensus/agreement etc. (this post) is the process of saying OK business logic is correct, how do we agree?

So business logic is important. When you are asked to agree on something, you need to check your business logic of that thing, if you concur then you add your vote. So this paper is about those business logic tests passing and how we get consensus/agreement among nodes in a decentralised network that is churning.

6 Likes

Maybe an important consideration is: How much of a delay is acceptable in different places of the algorithm? How much can the rubber band be stretched here, as opposed to there? For example, rewards. In today’s world everyone wants payment quickly but is it really necessary for the acceptance of the network?

2 Likes

The maths is a bit over my head, but this is really enlightening as to the different factors necessary to tick the validity boxes, not just consensus. The ability to eliminate forks is huge.

8 Likes

In a byzantine setting, which safe operates in, f_CFT is effectively f_BFT since you have no way to differentiate between the two without introducing further complications involving consensus. Furthermore, f_CFT causes f_BFT problems so it’s not practical to try to differentiate between the two beyond an academic exercise.

Has the current consensus-based membership failed? If it hasn’t, why not let it be tested first? Before the consensus-based membership was added, the alternative without consensus was tested and failed. So why not give a fair chance to the consensus based method too first? (the parsec-based test doesn’t count because the network had many other issues that were addressed by an overhauled design; would be great to see that new design to its end). If the new consensus-based membership has failed, then it should be clearly stated otherwise the above feels like a philosophical debate for debate sake. Also, if the consensus-based membership has failed, what’s the concrete alternative being proposed beside essentially “it can’t be consensus”?

Within the network, there will likely be settings where consensus will be required and other settings where consensus will not be needed. So coming in with an absolutist perspective from either end is probably not the right path.

5 Likes

I don’t believe that the network needs to know whether a node is faulty (f_CFT) or malevolent (f_BFT). The point of the maths is to demonstrate the ability of the network to tolerate a larger number of node ‘faults’, and decide whether it is worth experimenting with a larger group size (9 versus 7).

It makes sense to me to use the larger group size if the additional load (from 9! messages compared to 7!) can be mitigated (eg by early termination) sufficiently to make it worth the extra robustness with the larger group.

6 Likes

I see what you’re trying to get at, but it’s beside the point I made because:

  1. The set proportion of allowable “crashed”/“byzantine” nodes will be a constant regardless of the group size (7, 9, n)

  2. Since we are in a byzantine setting, it is better to go with the more secure byzantine assumptions rather than with simple majority (I can see where for certain operations, a simple majority might be adopted, but definitely not for coin transfer operations and probably not for membership)

Anyway, group size or fault limit is easily dealt with. It’s simple math with a ton of precedent after all. What’s more concerning is the membership mechanism. So I’ll just quote my previous post:

4 Likes

I think you’ve missed David’s point and my attempt to respond using it. Fingers crossed! :smile:

Your statement 1 is correct but doesn’t invalidate the analysis or its purpose.

Point 2 is also correct, except I think that coin transfers do not require consensus because they are created and then signed by the client. The network only needs to check that they are valid - it can’t fake transfers.

I’m not 100% on this but think that’s roughly it.

4 Likes

This is a key point and vital. Being blinded by an algorithm is really bad. I think many engineers reading the op will be surprised and that is good.

Well, parsec did, any new consensus mechanism will need to adhere to the points in the OP. The critical point is we cannot put invalid items into consensus to make one valid. They all have to be valid.

4 Likes

I can see the fascination in having data operation (and + CRDT?) as basic network operation underlying the validity statement…

If I’m able to sign, with my signature, a message,
e.g. “I want in!”
‘write’ it to the network
( through an Adult? Thanks to CRDT? )

shouldn’t that be enough for me, node
to be member, with age, and all the bells and whistles…?

I am sure I shall be tested,
and willing to proof my wholesomeness…or else…

PS sorry, random thoughts :roll_eyes: :grin:

1 Like