Background:
Wrote a script to start nodes in a local network since node-manager crashes with rpc cannot be contacted error randomly. Prob a race condition
Or node-manager just takes way too (hours and hours) long doing registry refresh when trying to run 500 to 1000 local nodes
Using my script (or node-manager) the memory usage per node increases as the number of nodes increases.
As in
upto 100 nodes each is taking the 80-120MB
as more nodes are added each node is using more memory
over about 450 nodes the usage seems to be 800MB +/- 50MB per node, and reports itself as 600+MB
No RPC port used
No Metrics port used
listening port set when starting the node (incrementing port # starting at 50000)
there was not a noticeable different with the rate of starting the nodes. The memory usage seems the same pattern/amounts if started quickly or long periods between starting each node
all nodes started normally as a local node with upto 20 peers specified. (20 when more than 20 existing)
So looking at total memory usage (using system monitor) I see 800MB per node when over 400 nodes and the node will report its using less at over 600MB and less than 700MB
The lower nodes do not seem to increase in memory as more nodes are added.
One possible reason i can see is nodes added later are using more memory to start while finding nodes to connect to, but never releases that extra memory. And is why the lower nodes are using less memory since they never needed it to start.
This will limit my testing using local networks. I only have 256GB main memory. The 48 thread CPU doesn’t have any problems handling the 500 nodes with less than 20% cpu usage.
I expected though that all the nodes would be needing the same nominal amount of RAM whether started early on or after 400 nodes are already started.
@joshuef@chriso who is the best to ask about why this occurs and if there is a way to not see the increasing MB of newly added nodes.
I see Node RAM usage for Nodes running in --home-network mode go up and up but ones running with ports set with --node-port reach a level and not really increase much. eg. for these nodes that have been running for 18 days. (VDash had crashed for the one on the 2nd screenshot.)
Not a priority, but maybe I’ve done something wrong and can told how to fix (or workaround) it.
If a feature then someone in the future will remove said feature and it will not have been tested.
But yes not a priority, one for down the track. The only priority is if I am doing something wrong and to find out what
Yea this though seems to be a specific thing with local networks (ie test network on your device). I say that because there are people running hundreds of nodes and I am sure they don’t have 256GB of RAM
This is a not RAM usage going up situation, its a situation where the 300th uses like 600MB and the 400th node uses the 800GB (showing 600+ in logs) from the start and it remains at that level only varying up/down by a few MB according to logs.
On my system VDash was working with the 500 nodes. But with uploading was showing inactive nodes which is either logs not being written or VDash wasn’t reading them fast enough for 2 minutes. CPU never above 40%
Thank you for the reply.
It does just seem to be a case of not returning memory to the system after the node finished starting up. Maybe after scanning for nodes to have as peers.
I have seen this on the main network as well. This, and the other bug where older nodes seem to earn faster and faster are good reasons for starting nodes early and for using a UPS. (Of course I have spent quite some time on looking into Rust memory allocation, and that on different architectures and OSes.)