Following on from a thoroughly enjoyable Discord Stage on Tuesday, we thought we’d tackle some community questions this week.
Let’s start with the big one :
Why are home nodes not earning?
OK this is a bug. It occurs primarily with Launchpad, because that automatically flags nodes as home nodes with --home-network, but it could happen with nodes launched in other ways too. The basic problem is that nodes are adding too many listening addresses and inactive addresses aren’t being cleared out properly, clogging up communications. There is a fix in place now which will make it into the next build, planned for next week. A second issue is we are seeing more incoming connection issues than expected. We’re digging into that one now. We are also looking to clarify the different connection options (e.g. UPnP, port forwarding) so we can give better instructions there.
Node numbering
This was an issue flagged by @southside who said the existing convention of numbering nodes from safenode1 up does not lend itself to optimal display and sorting. This might seem like a trivial issue, but we’ve built some infrastructure around it, e.g. monitoring, which would break if we changed the scheme. As a longer term issue, we could make this a user-configurable feature (because people might prefer different padding lengths), but this is not a priority at the moment.
Health metrics in log files
@happybeing raised an issue asking if the node health metric will be in the log files. We understand that for vdash
it would be ideal to incorporate and ingest the /metrics
(present now and more metrics added in 2024.8.1), and /metadata
(expected in 2024.8.2 release likely) endpoints programmatically for a majority of the program’s display columns metrics. We don’t want to increase the frequency of data in the logs to make it more real-time on vdash
as this will create brittleness as well as using more disk space. However, we are open to suggestions as to which metric folk would like to see and why in the codebase. vdash
is a fantastic asset, and we are taking inspiration from it as we build out features in the Launchpad .
Misreporting Stopped Nodes
On @loziniak’s issue about nodes are misreported as stopped on safenode-manager
, this only applies to a local network. There appears to be some incongruity between local network status and service network status. This should sort itself out once we are back in sync, but it’s not currently a priority.
API Issues
@jadkins three issues about the API, these are all covered by current work in progress so you should see them dealt with shortly.
Triage
Finally, there was a discussion led by @southside on Discord about how the community can help triage issues, by being able to understand the logs better. This is a little more involved, but we are working on a list of strings to watch for when grepping the logs.
As mentioned last week, consider us as BAD
is the sign that a node has been shunned. In addition, kBucketTable has
should tell you about the status of the routing table, while outdated live connections, still have
log the number of open connections (though it is not logged frequently).
And of course there’s the existing docs on HandShakeTimeouts
and CircuitReqDenied
from our troubleshooting guide
Explaining how we can use these to diagnose issues is something for another, perhaps the next(?), update.
General progress
As well as fielding community questions, @chriso has been working on an extension for testnet-deploy
to have different sizes of machines per environment type, modifying the tool and the related workflows.
@bzee has been busy (pun intended) on the API, getting a solid PUT method with all the inner stuff working correctly.
@mick.vandijke has been on wallets, specifically merging hot wallets and watch-only wallets and a new wallet file format similar to Bitcoin, that can be more easily backed-up and transferred than the wallet file + folders we have now.
And @qi_ma has joined @joshuef, @anselme and @dirvine on the Sybil- resistance work, putting in a PR in the libp2p
repo to support range-based search. Range-based searching allows us to interrogate peers that are further away from the target address, to prevent an attacker from monopolising the search space.
@anselme has been writing documentation for the currency system, as required by our community, partners and for the internal devs. Documentation is one of those jobs that’s always needed, but hard to achieve when things are moving fast, so it’s a sign that we’ve arrived at a workable solution here.
Back from a family holiday, @jimcollinson has been working on node launchpad UX upgrades, node health indicators, and node port forwarding and configurations for home users.
@roland raised a PR to expose a /metadata
endpoint via the node’s metrics port. This is used to return static info about the node. He also worked on a system to pull the Beta uploaders’ metrics into a single place, to make it easier to check for errors, and looked at errors caused by high bandwidth.
@mazzi has completed the Launchpad upgrade and is now testing it on different platforms.
And @rusty.spork hosted this week’s Discord Stages, chucking a few tasty curveballs at @bux, who of course handled them with aplomb (summary here). He also worked with some community members to obtain bandwidth metrics and requirements that partners have asked for. And he was responsible for leaking Jim’s node stats as part of showing off @mazzi’s new Launchpad while Jim was on holiday (that’ll teach him).