Best Safe Node hardware

Do port-forward nodes use less CPU than home-network nodes? I’ve been seeing CPU spikes recently but it seems to only be happening with nodes that don’t have the port-forward connection type.

1 Like

The spikes will occur most often when a node has to do something, like send a chunk or receive one. I doubt relay nodes will make a difference over port forward.

Also we have or will have more direct communications with relay nodes in that the relay node is only used to establish the connection and then is not involved. That process may involve some network comms but nothing to notice in cpu usage. Once connection is established then its completely no different to comms with RT nodes and so unnoticeable.

I’d say it will be extremely difficult to watch CPU usage and see a difference. It’ll be overshadowed by the very random chunk store/retrieval cpu spikes

2 Likes

Always use port-forward and UPnP if you can, home-network skould be considered a last resort as the network needs port-forwarding nodes as relays.

Can and do, however it’s not always an option.

As an aside, can antctl remove be used to remove a range of nodes, such as antctl remove --service-name antnode300-antnode399? Needing to repeat --service-name for each individual node to be removed can be more tedious than simply resetting and starting anew, which isn’t great for network stability.

I asked chatgpt to make me a script for stopping and removing nodes in an interval because as you need to add an argument for every service.

You are supposed to be able to repeat the service-name option multiple times in the one command line

1 Like

I’m trying to run some nodes.

Computer setup:

  • AMD Turion™ II Neo N54L, 2 cores @ 2.2 GHz
  • 1*4GB RAM
  • 2*4TB HDD
  • Internet: fiber 1Gb/s up and down
  • Ubuntu server with Prometheus/Graphana for monitoring

Nodes:

  • 40 nodes started.
  • After the initializing period (ideal for my conf is 130 seconds interval) I get those figures:
    • CPU Busy: ~35/40%
    • Sys load: ~40/50%
    • RAM used: ~70% of 4GB
    • SWAP used: 65%
    • HDD used: 24%
    • Nodes have between ~2 and ~150 peers with an average around ~30 ==> Is that normal?? I used to have more like ~250 peers per node with my previous VPS? Maybe because of my ISP’s router limitation?

Another question: Since it went from safexxx no antxxx binaries, I can’t get vdash working. It doesn’t show peers statistics with number of attos, records and so on. Is that normal?

1 Like

Did you ramp up gently with for example 5 nodes to start with, then if things are running fine add another 5, etc. to get to 40? That is a good idea.

Did you update vdash to the latest version with cargo install vdash ?

1 Like

Yes indeed, that is the way I’m doing. 10 nodes by 10 nodes, with 130 seconds interval between nodes.

And for vdash, yes I have the latest version. I’ll retry to install it from scratch.

1 Like

Number of Attos will never be correct. 3 nodes are paid for each chunk uploaded but only one of them will be showing it in the logs (& /metrics) and Launchpad suffers the same fate.

Records only shows up when a quote is requested from the node. This can take a day with the current size of the network, but should have one or more of the 40 show records within hours

Make sure you are using the 3.5 version of the node. Antup will get it and antctl will automatically be downloading it, so no worries there. Its only if you run a custom script, then you will need to use antup (antup node) to get latest version

Peers shown in vdash is more like connections current and is not routing table peers, unless that was changed at some stage.

2 Likes

Anyone able to run a big number of nodes (6K+) on the same machine without container?

I’m trying and failing to do so on Ubuntu server 24.04, after ~5400 nodes, I can’t start any new one (I tried with anm script from Ntracking and formicaio).
For information these are the logs that I have from formicaio for nodes that can’t start:

Killed process for node 674270314b51: ExitStatus(unix_wait_status(25856))
Process with PID 1553573 exited (node id: 674270314b51) with code: Some(101)
Failed to spawn new node: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }
Failed to create node instance 6363/10000 as part of a batch: error running server function: Failed to create a new node: Resource temporarily unavailable (os error 11)

I have plenty of free ram, disk, and cpu (though the CPU usage relative to the number of nodes is quite high compared to my laptops) and my router is more than fine.
Also, I’m far from hitting the max number of processes and I have some room for number of threads (I noticed each antnode process is using 259 threads on my server which is far more than the 19 that I have on my laptops, is there any way to reduce that?)

I’ll switch to proxmox with LXC containers if I can’t get it working but I’d rather avoid having to maintain multiple VMs.

@d3su ANM and NTracking were never designed with going into multiple k of nodes in mind.

over 1k nodes NTracking will start messing up its display as each time a node has its metrics port queried there is a 1 second sleep so over 1k nodes it will start to over run the time frame which at present is 20 minutes.

ANM was recently upgraded to handle up to 9999 nodes theoretically id be interested to hear how it went if you tried to run more than 1k nodes ?

1 Like

Yes, I’ve seen the commit, I tried it after that.
The nodes started ok up to ~5000 nodes and after that the new ones were not able to start.
I was not able to report anything with NTracking even at the start (telegraf seemed to be running ok, influxdb and graphana are on an other host and the dashboard is displaying ok all the stats from my other laptops running nodes through antctl).

But otherwise the biggest problem with the script for a big number of nodes is the minimum delay of 1 minute (I tried with 0.5, it didn’t work).

That’s why I tried the python bindings, but they are not working for starting nodes at the moment.
So I switched to formicaio which is great for starting a big number of nodes but I’m also limited to ~5000 nodes.

1 Like

i may upgrade NTracking to handle more nodes but the timings for bothering the metrics of each node are the main issue.

currently NTracking checks all node stats every 20min and with a 1 second sleep bettwen nodes which gives us 16 min of sleep time during a node status check and 4 min for the rest of the script.

if you managed to get to 5k nodes with ANM im pretty impressed but im guessing its some other bottleneck you are hitting due to all the process switching.

can i ask the spec of the machine you ran 5k nodes on ?

Just a wild guess; have you checked max os file handlers?

2 Likes

Epyc 7R43 (equivalent to Epyc 7763: 64 cores Milan), 1TB Ram, 4x16TB Hdd + 1x2Tb high end NVME SSD (all nodes are currently on the ssd).
I can run 2K nodes smoothly (50% cpu usage) on each of my E3-1535M v5 (quad core skylake mobile cpu), the Epyc 7R43 is easily 16x more powerful so it should theoretically handle easily 30K nodes.

Isn’t the limit per process?
My current limit is 1024/process

1 Like

There is also system wide max, i think it is this:

cat /proc/sys/fs/file-max

Just checked, this is an absurdly high number that I will never reach.

Is anyone here willing to post the max nodes they are or have been able to run on a single device and the specs thereof? Home based.

Only really looking for silly numbers, not part-timer or rookie numbers :wink:

3 Likes

Would Radxa Orion O6 be good considering perf/power consumption/price?