We hear “consensus” mentioned a lot, especially in cryptocurrencies. While Safe includes a cryptocurrency, we don’t use a consensus algorithm for that (we only need agreement). What that means is we have the client “get the agreement” and pass the agreement back to the network. It’s very clever, simple and correct.
So what is consensus?
I find it confusing as many people really mean “agreement on order” of events, while others may mean purely agreement on thee validity of events.
So immediately, there is confusion and valid confusion. I prefer to use the full phrase
- Agreement on order of valid events
- Agreement on the validity of events
To further confuse, the events we are trying to get agreement on really matter. There is an unobvious point to be made here.
When trying to get agreement on an order
- A consensus algorithm can only agree on an order on things that are valid in any order.
If we need a system to order events then that implies events are not ordered themselves (i.e. crdt data, counters etc. are all implicitly taking care of order). So an agreement protocol is required when there may be concurrent changes to a state and those changes require order.
For the algorithm to decide on the order then each of the concurrent elements requires that they are validly next in order (if they were not then we should not consider them in the first place). Given that 2 elements X & Y are being ordered then X or Y MUST be valid as the next event.
- If ordering 1 item invalidates another, then agreeing on the first should invalidate the second from being considered as a valid input to consensus again.
If some nodes have voted on X, but Y was selected as the next valid value then we have X with already signed votes. This is not good as these signed votes are invalid (they were not chosen). However being able to use these votes in the next round of choosing X or Z for instance then we already have votes for Z, BUT the nodes that voted for X previously may now wish to vote for Z, but they cannot as it would be a double vote and that should mean death for the node.
i.e.
Imagine these were bitcoin transactions. X has xfer from e->f for £1000 and Y has a xfer e->j for the same £1000. So when we select Y we must make sure X is invalid (not signed) and vice versa.
Or you have a new node joining, X or Y but we only want 1 of them. Again selecting X means Y should be invalid as we wanted 1. However in this case we may want Y ti join later, but we don’t want him already having votes he can join.
So it’s really when 1 event invalidates the other, even for a time period (that we cannot determine)
Therefore, we are forced to address this situation, with generation numbers or some other form of invalidating old votes.
When trying to get agreement on event validity
This is the realm of purely deciding on an event being valid. This is generally when we are collecting signatures from nodes without care of the order of what else is being signed.
In the Safe network, we say an event is valid when it is network valid.
e.g. storing data, and spending DBCs use this type of agreement/consensus on the validity of the action being taken.
This also has a very important issue to be certain of, and that is
When an event is deemed valid it must remain valid for the life of the network (we will come back to this)
Consensus algorithm fails
Now we can see consensus algorithms are required to order things that do not have their own implicit order, but where we require that order. To run this algorithm we must check:
- The event is valid
Wow, that sounds simple! All we need is a black box algorithm and fire events in it, and we will get them all ordered for us.
WRONG!
If the events are part of a system with its own order (say a counter), then we CANNOT run these through consensus algorithms as this will give us an issue. Here is an example
We have concurrent events stating the next value. So let’s see what can happen
Consensus algorithm orders → 4 2 3 7 5 6
Counter orders → 2 3 4 5 6 7
We can see the “source of truth” that is consensus just lied to us. The whole thing is a bust! (remember PARSEC, this was its massive fail).
So what we must check is:
- The event is valid
- The event is not part of an ordered system already
Consensus algorithm basics
We hear a load of 2f+1, 3f+1, 2/3
or 1/3
things flying around. These confuse as they are never really described well and simply. It is very simple.
First, we need to decide how many BAD nodes we can have. The math tells us that number is up to 1/3 of the nodes. WHY?
If we consider 7 nodes, the max number of bad nodes (which we call f
for faulty) is 2.
The reason is as follows
If we consider a vote is valid then it requires 5 (2f+1) nodes to vote. If we analyse that it means
2 can be bad, but at least 3 are good (always have a minority of bad nodes).
i.e. the majority of nodes voting MUST be good nodes. The attacker has an influence less than the majority. This prevents attackers from altering decisions. This is where 2f+1
comes from. It forces all decisions to contain a MAJORITY of good nodes.
Consider though, the rules of consensus above. All events are valid, all we want consensus algorithms to do is agree on order. So why does bad nodes altering the decision matter? (spoiler, it makes no difference)
The answer is not that they can alter order (weirdly, that is not a problem). It’s that they can create forks, by signing more than one vote. i.e. they vote for X and Y and both are then “agreed” as valid. That is an issue as we have a fork. So consensus agrees on the order in a manner that prevents forks (that’s all). For this reason, we cannot have a majority vote that is BAD nodes as they can double vote (mind you, we could detect that and kill those nodes). i.e. we can resolve the fork in many cases, but while the fork exists, there can be trouble (doublespend is the classic example).
Can we add CFT to BFT for increased security and liveness?
It looks like we can, in circumstances where we select >3f+1 nodes but <=4f+1. Let me explain:
I was under the impression (Byzantine Fault Tolerance) BFT included (Crash Fault Tolerance) CFT as BFT is harder to combat.
So we have a tolerance to <1/3 BFT nodes as we claim (as does all algorithms in the space). However, we can maybe do better for safety and liveness.
So normally, the population N is set as 3f+1
(we actually set our threshold crypto to the same value of required sigs which is 2f+1)
This means tolerating 2 byzantine failures (f==2) we have a group size of 3f+1 (7) and a supermajority of 2f+1 (5).
However, this is not the whole story. We can set the group size to 4f+1 and still have f_bft==2 and f is still <1/3 so we are still using the same assumptions on safety.
So the difference from us is 9 nodes instead of 7. What do the extra 2 nodes mean and what do they by?
Meaning
These are 2 additional failure nodes, but they are not byzantine. They are CFT fails. The difference being a CFT failure is a crash, so a non-participating node. It cannot do byzantine stuff like double vote etc. It is purely a node that does nothing (it crashed as far as this round of voting goes).
i.e. We have 2 fail types f_bft
and f_cft
. If we set N==3f+1 there is no room for f_cft
nodes. If we set N==4f+1 we have room for f f_cft
nodes IN ADDITION to the f f_bft
nodes.
Buys us
It buys us additional safety. Our supermajority is now a straight majority (only at N==4f+1). So we still have 2f+1(5) nodes required to agree, but we can have up to 4 failures instead of 2.
The trick/sneaky part is these failures can only contain 2 BFT failures. So it can be 4 CFT or 3CFT+1BFT or 2CFT+2BFT, but only when we use 4f+1 as the population.
So that extra f is where we can cope with CFT nodes existing.
It means we buy ourselves safety and liveness, but there is a cost. If all nodes do participate, then there is a chance of many more messages, but they should limit as any node seeing 2f+1 votes, knowing there is no need to keep voting. They can distribute the 2f+1 votes to all members as a finality clause. So this terminates the round and possibly faster? we can not really tell without measuring, but the increased membership does not strictly mean we gorm from !7 messages to !9
i.e. It looks like we have stumbled upon a non-documented (as far as we can tell) fact, that CFT can be added on top of BFT to allow greater safety and CFT does not require byzantine defences.
Bottom line is that I think this is a winner all round for us.
Note: a crashed node, or what we call a
f_cft
node is an honest node. It just failed to participate, it did not do damage. Af_bft
node is a dishonest node and will try and do damage.
Back to events (the critical part) MEMBERSHIP
So we have events, stuff happening and we need to decide on them being valid. Let’s dig into validity and what that means and when?
We can say an event is valid if:
- Our internal state shows the event is valid (we could be wrong)
- The event is signed by a supermajority of nodes (elders in our parlance)
- Some external oracle leads us to believe an event is valid (i.e. node lost) (again we could be wrong)
2 is easy as we can say that event is network valid i.e. it has network authority.
However, 1 and 3 are Very tricky. (Can we even put them through a consensus algorithm without knowing they are valid?)
Neither 1 or 3 are network valid and in Safe, that means we don’t trust it. So what do we instinctively do? We fire it into a consensus algorithm BUT WE CANNOT DO THAT. We are not trying to get an order of the event at this time, we are just trying to get “agreement on the event’s validity”. (We cannot feed a consensus algorithm invalid events.)
So herein lies my issue with membership and consensus as a solution. It’s not!
I am not saying that consensus will not order events, it will, but IT MUST ONLY ORDER VALID EVENTS. So when is an event valid? In our network we trust no single node, so 1 & 3 above are merely opinions, but not valid events.
To go from an opinion to a valid event, we must get network authority. This means getting an agreement the event is valid, regardless of its order!
For us, that means we gather a section signature on the event.
Does order then matter for membership?
Well, it can. We can always find ways that unordered events are better ordered, but should we?
If we have a bunch of valid events like nodes lost, then it does not matter the order. It has happened, the nodes are not there!
If we have decisions to be made on membership then it’s more difficult. Some decisions such as
- Deciding we need a new node
- Deciding we can split the section
- Deciding where to store/get data from
These cause us problems and I would love us to consider each of these decisions that we take on the “state” of the membership. For instance, do slightly different views on membership really matter that much? If we have Ae to constantly sync nodes and clients then we know that changes will be passed around each node in a very short time. Consensus algorithms take time anyway, it’s a lot to think about, but it’s not a simple thing, but it’s not dead hard either.
This is a huge post so far and I am gonna stop here for now, but I will continue this to the bitter end. I am super keen we get this right. So posting now, but will continue soon. I am keen on thoughts, ideas, debates etc. to get this right.