We know you’re itching to try the network out again, and having ground through some gnarly issues with the code we’re close to being able to offer formal testnets once more. With the entire team now focused on that goal, @joshuef explains what we’re up to and what to expect. So don’t worry, that itch will soon be scratched!
General progress
The team is working on ways to get testnets out more regularly to the community. It may seem that recently we’ve been a bit bogged down in the theoretical realms of consensus algorithms. In fact this is far from the only area we’re working on, and these efforts are, of course, tested internally, but not always on a full testnet environment and not always in a way that’s easy to share. However, @chriso has been toiling away on improving the release process so that we can roll out testnets more easily, and the rest of the team are now focused on ensuring all their work is testnet-ready, in the spirit of agile development.
Mostafa has now completed his implementation of simplified ABBA, the coin-flip consensus protocol we spoke about last week in the context of how elders come to agreement on membership matters.
Through the process of implementing ABBA, we’ve realised that the coin-flip protocol is not necessary when you have a preference towards a result. For example, ABBA is used to decide if an elder had proposed a membership change. If anyone sees a proposal from that elder, they vote YES, otherwise they vote NO. If there’s ever a split vote, that means that someone voted YES. Crucially, all YES votes come with a justification which demonstrates cryptographic proof that the elder in question did in fact propose something.
So if the question we are asking is “Did an elder propose a membership change?” Then a split vote would mean that yes! The elder did propose a change, and so we can resolve the split vote with YES.
In the original ABBA protocol, there was no preference between yes or no, hence the reason for the coin flip. Since we have a bias towards YES, then we no longer need a coin flip to resolve these splits.
Mostafa and @davidrusu are now putting the biased ABBA protocol through its paces. Next step will be to integrate VCBC with ABBA to arrive at the full MVBA (Multi-Value Byzantine Agreement) consensus protocol.
And @joshuef and @oetyng are looking into network knowledge issues which can occur after a section split when there’s a data query at elders. It seems to be due to a lack of knowledge sharing between the two new sections at the handover stage.
Testnets testnets testnets
After a few months getting deep into various network topics (membership, node state-locks, communication layers and responses), we’re looking forward to getting the code in the community’s hands once more.
We know that there have been sporadic comnets (and previously very frequent ones); and some community members may well be familiar with our testing tools in that regard. But here we’d like to go over what we have so anyone who wants to could have a go at setting up their own testnets.
The testnet tool
Our testnet tool is a collection of scripts and Terraform for setting up testnets. (Examples of commands are available in the readme file).
It allows us to easily spin up Digital Ocean droplets and run nodes on them. This is the basis of our WAN testing.
You have the ./up
script, which allows for creation of a testnet of any size. It uses one droplet per node (the size is easily configurable in the prodiver.tf
files).
If you want to enable heaptrack
on the nodes, then we have a ./build
script which spins up a separate droplet to build the sn_node
code and safe
bin (the node code with debug mode enabled so heaptrack
can hook in).
You can then use these custom builds in the ./up
script.
Lastly, ./down
removes a testnet once you are done with it.
Easy peasy?
Okay, so I have a testnet up…
Once a network is running, we have several tools to help us.
A client droplet
The terraform setup can also create a client droplet (instance). This allows us to easily loop client tests, for example, and see how nodes hold up (./loop_client_tests.sh
).
We also have a test-data
folder which is pulled down to the client from AWS. We’re aiming to put this on the network at the start of any testnet. And this gives us a simple enough way to test for data integrity over the lifetime of a testnet.
Monitoring
We use Kibana and ELK to monitor the nodes. We have a (currently private) dashboard where we can see any memory or CPU issues, which helps guide any debugging efforts. For example, below we can see our current blocker: memory is rising over time. This appears to be related to the connection management… We have one potential solution that seems to solve this, but we’re looking for something neater.
Logs!
The last (and most cryptic) tool in our arsenal is pulling down client logs. ./scripts/logs
does that for us. And then we can parse those with a tool like ripgrep
or search for e.g. specific MsgId
s to track what’s been going on in the nodes.
And so…
That’s just a small overview of how to use and assess a testnet. We’re hopeful if we can make this easier (we’re trying) and more public (soon!) we’ll be able to get more folk to monitor and check nodes and speed up debugging there once more.
So by all means, have a dive about the testnet tool. PRs are very welcome. There’s a lot of bash scripting just now, which may be up some folk’s alleys more than others… But at the very least, this hopefully gives you all an overview of how we’re testing just now. And maybe sparks some other ideas on how to improve such things!
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ;
German ;
Spanish ;
French;
Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!