How to keep your nodes healthy and productive

I have a dev friend with setups in the cloud with significantly weaker machines with so many nodes that earn significantly more and on that I based my assumption that it is related to the quality of the internet and loss of connectivity over time - because the last months of the beta were earning well on significantly higher load. But anyway I will test with 1131 nodes and I will tell you the difference. :dragon:


Check out the Dev Forum

3 Likes

My Hetzner nodes earned about 50% more than my home nodes, every time, during late Beta.

1 Like

I would like to weigh in here with my experience. Personally I do not think the lack or earnings have nothing to do with connection.

As most people know I run both from home and from the cloud. When it comes to data chunk storage, before TGE my home nodes performed close to equal (though slightly less) than my cloud nodes. After TGE home nodes have dropped off horribly, earning around 10 to 15% of what my cloud nodes are (have been: I’ll get to that) earning. My homes nodes are behind a very strong and capable router, are connected with fiber and have a very stable and high bandwith connection (not close to the limit). Both setups are equally loading the systems and have around the same amount of nodes running per device.

My earlier assumption was that this had something to do with latency, where nodes try to find nodes closest to them and most of the nodes being run at central places like data centers (Hetzner for example). I kinda expected that because of this, home nodes far away from those data centers are eventually going to lack connections as they do not have as many nodes nearby as the cloud servers.

What’s weird is that recently I slowly started killing my server nodes to restart (using a different reward address), but now earnings have dropped off by around 50% since. So the newer nodes are apparently not earning. Same servers, same settings, just started a week or 2 later.

I’m not sure if this is at all helpful, but I do think it paints a pretty clear picture that server load is not really a thing in general (perhaps for some individuals it still is).

1 Like

there was an issue discovered by Josh when running nodes (same hw, same internet connection, same everything … just a long ethernet cable)

… somehow I cannot post it because I posted the same content by accident in the wrong topic …

EDIT:

ofc by now everything could be different … not sure what changed within the last weeks and if that is still representative …

1 Like

I know about this and that’s why I use 0.5m cables…


Check out the Dev Forum

1 Like

I have a feeling that the problem might be with the ISP, server halls like Hetzner have no ISP in the way, from home data still needs to go through the ISP and that might have some impact and limitations.
Maybe @Josh have some thoughts on it.

This is just speculating, but this is what I think might happen. Please correct me if this is cannot happen:

  • nodes connect to other nodes close to their XOR address
  • their XOR address is close, but they can (and some will) be far away network wise
  • if a connection between 2 nodes is far away, there will be long latency and sometimes dropped frames
  • if there are enough dropped frames or long enough delay, those 2 nodes will eventually see each others as shunned
  • however, all other nodes for both will not be shunned

When time passes, there will be more node pairs which will shun each other, but not the complete network.

For good connectivity nodes (ie. cloud servers), this will happen less often.

2 Likes

I am currently getting 11-15 ANT rewards from 2.4k nodes. 5 physical machines + 2 VPS, all running since start with no restarts.

This could be about quality of BGP connectivity and peerings. Hetzner has motivation to have good and reliable peers everywhere. Your average home ISP needs line to Google, Meta and few others and for the rest the cheapest option will do. Not many customers will complain that communication to random addresses on the other side of the world is slow, has packetloss and sometimes doesn’t work at all.

Yes, this is very important and hard to pinpoint. Also HW plays a big role - CPU caches, RAM latency, network card and its driver.

Another thing is OS tuning. Defaults are for general use on average HW, it may be highly ineffective for running thousands nodes on multicpu server. Sometimes changing few kernel parameters can make huge difference.

Btw has somebody tried setting nodes with fixed CPU affinity? In theory it should reduce CPU cycles needed for context switching, especially on CPUs with multiple core groups or multi-CPU systems.

I don’t believe the lenght of the cable was the issue in term of latency. Probably the cable was bad, not bad enough to loose link but maybe packetloss, malformed packets that need to be resend etc.

4 Likes

Just to note that the same setup was working fine on a much heavier load for the last 2 months plus it was working fine the first few days and gradually reducing the token yield to zero, it hasn’t stopped abruptly.


Check out the Dev Forum

Have the nodes, that earn less, actually been shunned? Can you see that from the logs?

Also, when talking about ā€œearning 50% lessā€ (for example), I think that the only way to make that statement comparable to any other number it must be:

50% less in an exact same UTC hours, than the other nodes it is compared to.

Also, I think we would need to see some good statistical analysis about what kind of variance is expected within certain sample sizes.

But, the most important question, has the nodes been shunned or not?

1 Like

heavier load meaning more nodes or just %cpu/mem/hdd activity at a lower node count …?

because the context switching stuff might be mainly process == node count dependent if that causes some issue …

1 Like

Just to add some info I have a hetzner running 1k nodes earns about 10 ant a day and home nodes (multiple machines running from 50-200 nodes each) earn between 1-2 ant per day each so pretty equal for me.

2 Likes

Until my mining stopped, I kept the same amount of nodes - 3000 per machine. I then scaled up to 6000. Now on 1 machine I’ve throttled their initial launch to 1100 nodes and will monitor how it plays out.


Check out the Dev Forum

1 Like

I think they are not completely shunned. If I have understood it correctly, it’s not like some node is shunned or not. It is shunned to some other node/nodes, not necessary to all other nodes.
Since the nodes are spread all over, there are quite much always some node pairs shunned.
But this is speculating, maybe we should try to prove it true or false? Either by searching the logs or by reasoning based on the real behaviour of the nodes?

~100 nodes, 1.5 Ants / day (4 x 0.381 ANT), some variations day by day. One old laptop, 2 addresses.

1 Like

Not really other than my finding that a group of machines on a very long ethernet cable were earning significantly less.
I observed higher ping times and attributed it to that, Rob suggested packet loss but I couldn’t be bothered hunting the issue down and just moved them closer. Solved the issue

1 Like

I’m taking bring what you have to the next level :joy:

6 Likes

Neo suspected error rate

I was the one that suspected packetloss

1 Like

This could easily drop the connection from 1G to 100M. Or even to 10M…

Maybe, still have the cable. Cat 6 but copper clad I was too tight to get solid copper. Should have at that length.

For interests sake I’ll run a speed test on it later today.

3 Likes