A few more releases have been published after v0.4.0, latest one (v0.4.4) has just been published and available for all platforms, to fix a couple of issues and some improvements/feats:
acting (batch) on large number of nodes could lead to a couple of nodes left behind without applying the action.
CLI arg to sort nodes by different values, e.g. nodes ls --sort conn-peers / nodes ls --sort conn-peers-desc would sort nodes by number of connected peers in ascending/descending order respectively.
CLI arg to create node actions batches not only with using multiple Ids but also with matching multiple node status, e.g. nodes start --status inactive would start all inactive nodes.
node logs available through GUI in native-mode even when node is Inactive.
allow to disable home-network on nodes to run using a Podman deployment.
Are you using your own dockerfile to run the binary?
You need to start the backend with formicaio start command, you can see in the docker file I use to build the images (the --addr arg is optional):
What exactly modifying the setting āHow often (in seconds) to fetch metrics and node info from active/running nodesā does?
I set it to 120s but Iām still starting a batch of nodes with a way shorter delay than 120s and the number of nodes in the web page is refreshed almost live (so less than 120s).
The backend periodically initiates a round of queries to all active nodes to retrieve their current metrics via the /metrics service, and caches it for the frontend/CLI consumption at any time. That setting defines the frequency at which each round of queries is initiated.
The frontend is currently polling the backend every ~5secs, independently of the backend tasks and their set frequencies. When you add nodes they are immediately created, the frontend will see them in its next polling cycle at most ~5secs later, but the up to date metrics of such new nodes wonāt be fetched till the next round of metrics querying is kicked off by the backend as per the value of the setting.
Understood.
The frontend has a hard time rendering with ~5000 nodes, and I think it will be worse when Iāll have all my nodes started.
I see that youāre using rust, I donāt know if you can easily do some sort of virtualized rendering (just compute the render of displayed elements rather than compute everything), or maybe offer a setting to display only a subset of the nodes in the main page.
Otherwise, I really like your app, this is the best so far for running a large number of nodes
This is what Iām currently working on, having some pagination mechanism, and/or the option to choose a different type of view, i.e. a list view instead of that list of cards.
I donāt think thatāll make it worse, it shouldnāt, but the frontend will still be affected just the same way.
Another (short-term) solution Iāll be releasing in next version is to reduce the frontend polling frequency according to the number of nodes.
Currently the Formicaio CLI is probably the best way to manage large number of nodes like in your case.
Iāve continued to run on Umbrel and itās smooth and effortless getting nodes up and running, connected to peers, etc.
As nothing has changed in terms of stored records, etc., for about a week now, even after doing the last two updates, Iām wondering if the network is just not active or if something else is going on. Not thinking it has anything to do with Formicaio, but Iām a bit in mystery about whatās up.
Thats efectively the size of it. We have tens of millions of useless datacentre nodes that are at least a version behind and are slowing everything down,
see ANT Token - Price & Trading topic - #930 by chriso onwards
Uploads are a total pain right now - possibly getting slightly better. There are tools in work right now that need some sharp edges filed off that will make uploading a LOT simpler for most folks but until we can get 12-15 million nodes to upgrade that have no real incentive to do so - cos NOT upgrading maintains their earning potential - then we are somewhat screwed.
Would that not give further financial advantage to the whales and penalise the home users who still need to use relays? Im guessing the home users are mostly already upgraded to 0.3.7.
Is there a way to try and restart all the inactive nodes with a delay?
I was able to start about 5500 nodes then some of them were killed with the following logs from the formicaio start command:
Killed process for node 674270314b51: ExitStatus(unix_wait_status(25856))
Process with PID 1553573 exited (node id: 674270314b51) with code: Some(101)
and after that all the other nodes of the batch (10K nodes) had the following error:
Failed to spawn new node: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
Failed to create node instance 6363/10000 as part of a batch: error running server function: Failed to create a new node: Resource temporarily unavailable (os error 11)
Is there some more detailed logs somewhere?
I restarted some of the inactive nodes manually today and they were able to start (also, I still have 800GB of ram left, enough disk space on a high-end ssd and CPU is sitting at 25%, so I wonder which āresourceā was unavailable).
If you mean logs from nodes they are all located at <formicaio-bin-dir>/formicaio_data/node_data/<node-id>/logs/antnode.log (you should be able to see its content also using the GUI even when the node is inactive).
Check the antnode.log of some of those nodes which failed to start or which exited, perhaps there is some issue with UPnP which would cause the nodes to exit if they cannot set it up (assuming you are trying to use UPnP).
Also try using the interval as it may help if the issue is that either the system or the app is being overwhelmed when trying to start too many nodes without any delay in between (ā¦perhaps the app should enforce some delay when the number of nodes is too largeā¦).
Sorry, I left some details out, I already checked the node_data dir of some of the inactive nodes and the dir was empty save for the antnode bin.
It is possible that my delay was a bit aggressive (3s), but I thought the server could handle it (and actually, the 5400 first nodes are all ok, after that none started, so itās a bit weird)
Iām trying your command, no result yet, but maybe it takes some time for it to list all the inactives nodes (I can see some of them have a new status in the frontend: Inactive (batched)).
Iāll see how it goes, but it might well be easier to run multiple VMs with a smaller number of nodes (I also noticed than on my server, each node spawns 259 threads whereas on my laptops it spawns 11 to 19 threads depending on the cpu)
When you send an action which affects many nodes like what youāve done, it automatically creates a batch, thatās why you see them with status Inactive (batched). You can see info about batches on the CLI as well with $ formicaio batches ls.
Remember you can use --help on any CLI command to get a list of available subcommands and options, e.g. $ formicaio nodes --help, or just:
$ formicaio --help
formicaio 0.4.4
CLI interface for Formicaio application.
USAGE:
formicaio [OPTIONS] <SUBCOMMAND>
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
--addr <addr> Backend IP address and port
SUBCOMMANDS:
batches Batches commands
help Prints this message or the help of the given subcommand(s)
nodes Nodes commands
settings Settings commands
start Start Formicaio backend application
stats Stats commands
Just FYI, Iām able to reproduce this issue.
The scenario where Iām reproducing it seems to be due to some other old nodes I had running on the same ports of those new and failing nodes due to the ports being used.
Could that be also your case?, perhaps you had some nodes running with same range of ports?, maybe on Docker (thatās where I had them and I forgot they were running there) or even amongst the new ones they are trying to use the same set of ports?.