Monday saw the launch of a new node version with improved caching, reward forwarding and messaging, and in the background a new way of updating the network by gradually upgrading our own nodes first then encouraging the community to do the same. By and large this automated process went pretty well, although a few nodes had to be upgraded manually. Staggering the launch in this way is a precautionary step as we don’t want to knock the network over with too much churn.
A casualty of the new update process was Telegraf, the software that grabs metrics from the network, and allows us to monitor network performance. @chriso had to disable Telegraf for a period to get the upgrade to work on our nodes. This has meant that results, numbers and conclusions from the upgrade are only now starting to roll in.
Some of you have posted questions about the size of the network, i.e. is it smaller this time round? And why might that be.
The answer is yes; we have seen a significant contraction in the Network’s size, with the number of nodes down around 50% at the moment. On the face of it, that might seem a bit shocking at first, but dig a little deeper and the picture is very positive:
-
The upgrade has resulted in us blocking nodes that have been under-performing to the detriment of the Network. So a large proportion of these zombie nodes have been culled. Quality not quantity!
-
We’ve seen a significant jump in nodes being accepted through relays, up from 30% to 85%. Great news for those of you behind home routers!
-
Uploads speeds have significantly increased — doubled in fact!
-
Some of you will have noted an increase in RAM usage. This is because nodes now have a record-store cache in memory, reducing the burden on disk I/O and consequently reducing CPU usage as well. A worthy trade-off and closer to that sweetspot of resource usage and performance.
-
Both bandwidth utilisation and replication have been reduced by 20% as well. So more big steps in overall network performance!
So all-in-all very positive developments!
Reporting Issues
For those who are unaware, to report issues you can use the /help
command in the Discord Bot.
Try to give as much detail as possible about your OS, node version etc. Of course, you can also post them on the forum or on GitHub. Thanks @happybeing for doing just that for API issues and for ruling out mobile broadband as a possible cause.
General progress
@chriso has been the man pulling the levers behind the new stable release workflow. Chris has professed himself satisfied with the way things have gone, despite the odd hiccup, and said it was a pretty good learning exercise. It’s quite a slow process and after a couple of runs there were still some nodes that had not upgraded, so he’s digging into that now. We also needed to take down Telegraf (which grabs the metrics) for the duration, so we can’t report the numbers till that’s switched on again.
Meanwhile, @mazzi worked on tweaking the launchpad UI as a precursor to more changes to the layout. Some good progress going on there.
@anselme made some more preparations for introducing the revamped spend. This will likely be a breaking change, meaning a new network, so we’ll do that once we’re ready. He’s also looking at how to scale the audit DAG to make it a fully fledged DAG Explorer.
On connectivity, @bzee is digging into an issue on a transport redesign in ongoing libp2p
. The transport redesign is needed for AutoNATv2 and there seems to be a problem with DCUtR, which we have disabled for the time being. Let’s hope they get it sorted soon.
@qi_ma and @roland have been testing different combinations of network variables, such as staggered intervals between launching and upgrading nodes, between different testnets to try to work out optimum settings, as well as helping launch the current Beta.
@mick.vandijke is mostly client-side at the mo, implementing some changes to make it clearer what the file address is after uploading a file in the console output, and adding download time. Mick is also working on a benchmarking tool.
In Optimisationsville, @joshuef has been engaged in some performance improvements highlighted on his work on Sybil resistance, including a PR that improves the record store cache efficiency and also deduplicates duplicate concurrent queries from the same peer.
Finally, @rusty.spork has been fielding your queries and observations, including this week: continued lack of nanos for some, incremental RAM increases over time, anomalies with new versions of macOS and Windows, and folk getting kicked off VPSs for crypto mining (ironic, given the first issue). Maidsafe has also suffered this and got it sorted with a stern letter, so we’ll get a complaint template up in due course.