We know you’re all champing at the bit to get testnetting again, but there are one or two fixes we need to make first to ensure it will provide valuable learnings for us. One in particular is getting node ageing working reliably. @joshuef explains more about the hurdles and how we are preparing to leap over them as we head for greener pastures.
General progress
There’s loads going on as always, including some deep dives into the fundamentals. As some of you will know, our qp2p
networking library is based on Quinn, a Rust implementation of the QUIC industry standard that was developed by Google.
The good thing about adopting widely used components is they have lots of eyes on them and are constantly being updated, but a drawback is that they don’t always work quite the way we want them to. In this case, the TLS library relies on DNS and that requires a certificate authority (CA), both of which in their standard form are a no-no for a p2p network. But… David has a cunning plan to use our group consensus as a way of becoming our own CA, which should allow us to secure traffic and sign with our ed25519 keys, at least when an updated version of rustls comes out, which should be soon. Ultimately we want to minimise our use of qp2p
and this will be a step towards doing so.
@bzee has also been delving into qp2p
and is adding a stripped down version to the stableset_net repo to improve the efficiency of messaging. @davidrusu is also working on the stable set, including getting the Stateright tool to offload its work queue to disk to allow us to test more elaborate models.
On DBCs, @oetyng has made good progress with updating the sn_dbc
crate as well as clarifying the language we are using to describe blinding (hiding the amount transacted) and unblinding to make it easier to follow. He’s now working on updating the command flow between clients and elders when transacting with DBCs.
@bochaco is working on exposing gRPC for safe_node
and adding a step in our testnet tool to check launched codes using such a service.
Roland has been beavering away on telemetry, improving our visibility at the node and function level. Traces allow us to see what’s going on with every function, but in their raw form they are hard to read, so Roland is pulling them into OpenSearch where they can be aggregated by node and time to give us a highly granular picture of what’s happening where.
And with the legal stuff out of the way @JimCollinson is turning his attention to branding. What should our core messages be, and how should we present them? There’s a lot to learn from others who have done it right, so if there are any companies or individuals you find particularly inspiring, let us know in this thread.
Progress towards an ageing network
There are several things afoot that we’d like to get more solid for any new testnet. The main issue we saw in the last testnet was that our relocation code, and therefore node age, were not working as expected.
The basic reason for this was that we just did not have enough churn prior to opening up the network. But it opened up the question of how to reliably achieve that and relocation going forward.
One simple change has been reducing the node’s starting age, this gets more relocation faster… but this itself also comes with various costs and tradeoffs - especially until we have tiered data storage in place. But we also hit some other issues there.
We’ve seen that our now previous “two step” approach to membership was causing us issues (we have Membership
where voting happens and also SectionPeers
which was supposed to be up-to-date for membership changes and based upon our SectionAuthorityProvider
(SAP), but these two could get out of sync). As such it’s great that @qi_ma 's PR has been merged, which should reduce such discord.
We also ran into an issue with membership voting decisions getting incredibly large, and causing far too much traffic, sending verification time through the roof. This itself has been refactored back to sane levels, and also inadvertently exposed some blocking we had going on.
That process blocking has also been nixed, so things are working much more smoothly around membership voting and DKG now too.
As such, we’re back at looking at relocation velocity and trying to ensure we’re not seeing any loops as nodes move around the network. Once that is in place, we’ll be in a much better place for handling churn and storing data.
This in itself is a great place to be, but we’re not stopping there. Work on a separate POC for the stable set has begun, as some folks here have noted! This aims to be the simplest node (and it could well be vastly simpler if we can avoid a lot of the network knowledge maintenance and DKG that comes from our main
branch). It’s still early days there, but we’re keen to get the simple implementation in and trash the hell out of it to see how things are looking.
Both paths are quite exciting and will hopefully yield another, more “aged” testnet in the near future!
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!