How to keep your nodes healthy and productive

Hannu · February 27, 2025, 7:52am

Edit: 1. This might not be needed after next update. 2. Since restarting your nodes is not recommended for the network health, avoid running this without reason. But nevertheless, prefer this instead re-creating your nodes.

Looks like you can keep your nodes productive by just restarting them. Afaik, reset is not needed. This is based on my few days tests, so could someone confirm it. I suggest to run this if your nodes rewards start to degrade.
Maybe a few people who knows scripts could check and review it? I would be suspicious for any script published on any forum…
But here is the script:

#!/bin/bash

# Run antctl status, collect service names
tmpfile=$(mktemp)

echo "Collecting service names with antctl status.."
antctl status > "$tmpfile"

# Find the names
services=$(awk 'NR>5 {print $1}' "$tmpfile")

# Remove temp file
rm "$tmpfile"

# Count services
service_count=$(echo "$services" | wc -l)

# Run stop - start for every service
for service in $services; do
    echo "Restarting service: $service"
    
    # Stop service
    COMMAND1="antctl stop --service-name $service"
    echo "$COMMAND1"
    eval "$COMMAND1"
    
    # Restart with interval
    COMMAND2="antctl start --service-name $service"
    echo "$COMMAND2"
    eval "$COMMAND2"
    echo "Sleeping 60 seconds."
    sleep 60
done

# Print service count
echo "Number of nodes restarted : $service_count"

dirvine · February 27, 2025, 7:55am

Be aware, as this is harmful to the network in time this behaviour may very well mean nodes actually have to initiate permanent bans on nodes doing this. It also could lead to reduced payments as you miss chunk store requests while restarting your node.

It’s the balance we try to make with small inconsistent nodes and then people purposefully causing churn on the network that is an issue.

Hannu · February 27, 2025, 7:58am

But the problem is that currently people are resetting the nodes to keep them healthy, which is AFAIK much worse to the network. Just restart is a lot softer way, and downtime is less than a minute / node.

Toivo · February 27, 2025, 8:10am

What is the cause of nodes becoming unproductive? Does it happen in VPS’s as well as nodes run from home?

dirvine · February 27, 2025, 8:11am

You are resetting ALL nodes whether they are working or not though.

dirvine · February 27, 2025, 8:17am

It should not happen. I suspect if folk run as many nodes as possible on a machine then things like that will happen, but otherwise it should not happen.

We need to somehow alert folks that they are just running too many nodes. The launchpad I believe tries to measure how many nodes to run on disk space at least, but we likely need to do more. It’s very hard to make it simple to run nodes but also avoid abuse at the same time. Not that folk always will purposefully do that for abusive reasons but I think some people are running as many nodes as they can and then doing things like restart them all periodically.

It’s not good for them and hopefully the network does manage to find those nodes and shun them, but it all takes time. so we likely need to do more, but I think people will push as far as they can and then do scripts like this so squeeze some more out of node count and so on. It’s not good for the network and that means it has to be less good for the operators acting like this really.

It’s just not clear to people that purposefully switching nodes on and off again repeatedly is a bad thing, when you think about it, then it obviously is, but it won’t stop folk dong it.

peca · February 27, 2025, 8:19am

How do you define “healthy” an “not healthy” node? Memory leaks used to be a problem, but now memory usage on my nodes looks steady.

Dimitar · February 27, 2025, 8:21am

VPS’s get a lot more ant than home ones, and the nodes from home gradually reach 0 yield. So my hypothesis is that over time they lose connectivity, while those in the cloud don’t because of the better internet. I had to reset all my nodes to get mining going again… this happened on all 4 of my fiber lines.

Check out the Dev Forum

Hannu · February 27, 2025, 8:32am

This is the reason i wrote that. Even moderate amount of nodes seems to degrade over time. I’m running ~100 nodes with 8G / 2T, 4core i5, 1G symmetrical fiber.
I used to reset and create new nodes maybe once a week, but if just stop and start is enought, it is much nicer.

Best would be if I could just start the nodes and let them run forever untouched.

peca · February 27, 2025, 8:35am

Do they loose peers?

My home nodes about the same number of peers since start.

Hannu · February 27, 2025, 8:36am

Yes, switching off and on is not good, but resetting, ie removing all and creating new ones is much worse.
Just stop and start should not cause all stored data to be duplicated. But remove and create new will cause that.

dirvine · February 27, 2025, 8:38am

It may very well be related to an issue we are working on with relay nodes. Some nodes are relaying through other nodes and they may not have good direct connectivity. So those nodes that require to relay through others may be suffering. It’s being worked on for sure.

Dimitar · February 27, 2025, 8:40am

Well, that’s exactly how I acted in December and January. And the nodes worked with 3-4 times more traffic than now and I use bridge mode and I don’t have routers to limit me. After 11.02 it worked fine for 5 days and then it started decreasing yield until it got to 0 on all my machines. After a reset it was fine and now I’m waiting to see if they degrade again.

I haven’t checked because I was using antctl, and it’s broken above 500 nodes. But @neo analyzed my logs from one machine and said they were connected to 1.3 million other nodes. Now I’m using dockers with 200 nodes each and I’ll be able to see better what’s going on, of course if antctl allows me…

Check out the Dev Forum

Hannu · February 27, 2025, 8:48am

There were incoming update today? I’ll change the original post a bit

peca · February 27, 2025, 8:52am

I wonder if it is HW specific or people are pushing their machines too much. My machines with most nodes have 500 nodes each and they do antctl status fine in about 10 minutes.

Dimitar · February 27, 2025, 8:55am

Well, here’s a 6000 node machine, doesn’t look busy at all to me. I once waited 3 hours for antctl to update the registry and gave up:

Check out the Dev Forum

Hannu · February 27, 2025, 8:57am

Ouch, this might cause problem to the script, it synchs also on every stop. My ~100 nodes takes only tens of seconds.

Hannu · February 27, 2025, 8:58am

Actually when you mentioned this, same for me. I was only running 4 nodes back then, but they earned steadily. I have NAT and used upnp back then (now port forward).

born2build · February 27, 2025, 9:03am

What are the hardware specs? I haven’t got anywhere near that with a relatively beefy machine.

Dimitar · February 27, 2025, 9:06am

DELL 7920, 96x Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz , 768GB RAM, 1TB NVME

Check out the Dev Forum

Topic		Replies	Views
Announcement: Preparing for Today’s (18th Dec) Network Reset Updates	119	1636	May 9, 2025
Announcement: Antnode Upgrade 0.3.1 Updates	28	588	January 4, 2025
Update 15 August, 2024 Updates	37	965	September 26, 2024
New network going up now, testing 4MB chunks. Time to reset your nodes Updates	117	709	October 14, 2024
Reset your nodes for Launch! Updates	32	462	November 7, 2024

How to keep your nodes healthy and productive

Related topics