A good week of bug fixes and tweaks in preparation for our next testnet: NodeDiscoveryNet, which will look at how nodes find each other on the network.
We’ve found a cause of excessive memory usage. We thought that nodes were only sending messages to the eight closest nodes to them, whereas they were actually sending messages to 20 nodes. We’re not completely sure what all the knock-on effects were, but for any call where we were assuming a close group there were a lot of wasted messages - which translates to increased memory use and may well have affected data replication too.
Because we run lots of nodes per machine on the testnets we have little tolerance for the unexpected. We did see some nodes dying unexpectedly, and that increased memory use is a possible reason.
The other place we saw node death was around replication - this had to do with the default Kademlia/libp2p
processes adding crap to the routing table, including nodes that are behind NAT and can’t be reached. This meant that nodes thought they were connected to the network but actually weren’t - a likely cause of the ‘I’ve joined but I have no data’ issue, and also of nodes dying unexpectedly. So, now we are manually adding entries to the routing table instead of relying on Kademlia to do it when a connection is detected. Sometimes you just have to roll up your sleeves and do it yourself. It may be that nodes which find themselves stuck in the netherworld will need to restart to properly get onto the network, but that shouldn’t be necessary, we think.
General progress
@Chriso has spent the week working through UX improvements to the installation and logging process, thanks to feedback from the last testnet.
@aed900 is working on a batch file to extract counts of significant log messages/errors from testnet node directories, similar to what @Shu has been doing to create his graphs. He’s also working with @Chriso in making the testnet tool more useful, with the ability to kickoff testnets via the github UI coming soon.
@Anselme has pretty much finished implementing PUT and GET operations and replication for registers following his previous work on chunks and spends. Big news as it sets the infrastructure stage for DBCs! He’s also been refactoring and rationalising the related crates to help with future work here.
@bzee is stripping out our custom code for managing communications with peers (dialling in the jargon) without requiring an ID, replacing it with the native libp2p
functionality that prevents dialling a peer if messaging has already been initiated. He’s also changed our APIs so as to not need a peer ID.
Together with @joshuef, Benno has also been looking at manually inserting entries into the routing table rather than having it happen automatically. That process, we have noticed, sometimes inserts junk (see above) with unfortunate side effects.
@qi_ma is working through various scenarios about who verifies what when a client pays for chunks. As always, we want to load as much onto the client as possible, but not to the extent that it can cheat the system.
And @bochaco and @roland are removing bulletproofs from the DBC code, and integrating that into payment processes. That’s a step we feel we can now do without with the significant benefit of simplifying other processes (and one of dubious benefits as store cost will be known, effectively unblinding large swathes of the transactions on the network).
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!