So I’ve been working on a Python port of anm, which was progressing nicely until I lost the ability to run nodes.
Weave Node Manager seeks to monitor node clusters and automatically make adjustments to keep the nodes running within a set of adjustable rules.
I have a request for someone running a node. Somehow, I forgot to backup the output results from a node, which I want to simulate in my lab so that I can get back to development.
The first thing I need is a port dump of the two endpoints that I can’t make work at the moment. First the metadata results
This little project makes a small but not tiny fake node to be used to simulate nodes without wasting network bandwidth or disk bandwidth (or in my case, when the network doesn’t work in my labs). All said and done, on OSX Sequoia this little script is still 24MB in size and doesn’t yet support many customized metrics (just uptime) but I may add more over time as wnm gets smarter.
I’m a little frustrated though. I can’t use antctl as a node manager with fakenode because antctl checks that the executable name matches, but I’m currently using a modifiable shell script called antnode to launch fakenodes with custom settings (like node version to report) and antctl start fails because the names don’t match (antctl sees /bin/bash instead of antnode).
Obviously, my use case shouldn’t distract the team, but my only alternative is to build a binary version of this script (in Rust?), which would then require building and managing binaries for each fake node version. I almost did that solution first, but I don’t know anything about rust so I can’t pair with AI like I can with python.
It’s also unlikely that anyone using antctl would bother with fakenode, so solving this edge case doesn’t seem necessary at this time.
Ultimately, even in its current state, this allows me to restart work on wnm, so it has succeeded in helping me move forward!
The latest (v0.3.1) of wnm finally supports concurrent actions for those node operators running on large-scale hardware.
Setting –max_concurrent_start 10 or –max_concurrent_upgrade 5 will allow wnm to spin up or upgrade clusters faster.
Also added was –force_action update_config, which is a no-op action, allowing a node operator to change settings in the database without running the decision engine.
I’m having a spot of bother trying to change config cos I stupidly started it with the default ports 12000 etc
(wnmvenv) willie@leonov:~/weave-node-manager$ wnm --force-action update-config
WARNI [root] Cannot change port_start, metrics_port_start, or process_manager on an active machine
(wnmvenv) willie@leonov:~/weave-node-manager$ wnm --force-action stop
WARNI [root] Cannot change port_start, metrics_port_start, or process_manager on an active machine
(wnmvenv) willie@leonov:~/weave-node-manager$ wnm --force-action teardown
WARNI [root] Cannot change port_start, metrics_port_start, or process_manager on an active machine
(wnmvenv) willie@leonov:~/weave-node-manager$ wnm -c ~/.local/share/autonomi/config
WARNI [root] Cannot change port_start, metrics_port_start, or process_manager on an active machine
oh, interesting. I need to detect when those non-changeable settings are present in the configuration file but are still the same value.
almost all settings persist once set, so the simplest way to unstuck while I fix this later tonight is to remove/ comment those three settings from the configuration file.
I’ll get back to you on the default configuration file location, I mostly configure on the command line. I think it’ll be ~/.local/shared/wnm/config for Linux and ~/Library/Application\ Support/wnm/config on osx
v0.3.9 adds support for –-action_delay milliseconds, which is inserted between bulk operations.
It also adds --this_action_delay and --this_survey_delay options, which are like their respective options, but the settings do not persist and only apply to that execution.
Finally, there is a new --report_format env setting for --report machine_config, that will output the machine_config settings like an env file that can be called later with -c/-–config path. Be sure to use -q if saving to a file like so:
v0.3.14 (pie time) adds a shortcut --json that sets --report_format json
It also adds --report_format env to the --report machine-metrics, allowing exposure of the metrics and config values to be used in shell scripts (documented examples in docs/USER-GUIDE-PART-3.md)
Wnm started before the token went live, so while there is probably WAY more than anyone would want to read, I have an ongoing, lightly edited stream of thought of the work on this project.
I pondered that the material is too verbose to post in this forum, so I decided to publish it as an extra rather than a single post.
Maybe AI can read that post and discover my secret plans to take over the world!
v0.3.25 adds -–anctl_path ~/.local/bin/antctl for environments where PATH may not be being set properly as happens during cron.
v0.3.26 added -–antctl_debug which sets the –debug parameter for antctl commands
v0.4.0 added a new ‘zen’ mode for the antctl process manager that doesn’t mangle the node path’s and instead reads what antctl uses and then updates the database paths after.
v0.4.3 added -–antctl_version #.#.# that allows pinning the antnode version antctl will use
v0.4.4 added –-rust_backtrace [int|full] that sets the RUST_BACKTRACE environment variable for antctl commands
v0.4.7 added --force_action disable_config which allows setting false status for truthy config arguments like --antctl_debug and --no-upnp
v0.5.1 fixes a race condition where a site survey was removing status flags UPGRADING/RESTARTING/REMOVING
v0.5.3 fixes a bug where, during a –force_action, when none of the named services (with –service_name ) match, wnm would fall back to stopping/starting/upgrading/removing a default random (youngest/oldest/etc) node. Now, if service names are specified but no names match, a warning is given, and no default node is affected.
v0.5.6 now supports auto-upgrade of the node network and, by default, disables upgrade logic in WNM.
A new flag --enable_upgrade provides a non-persistent setting that turns the upgrade logic back on for that cycle.
In addition, --force_action upgrade still allows upgrading a default or specific nodes. However, take care as the force mode does not follow delay timers or concurrency limits.