I’m running a couple thousand nodes from home using port forwarding (on multiple machines) and I noticed some connectivity problems so I decided to stop them all to investigate.
I stopped a couple of my machines about 5 days ago and the rest of them ~24h ago but I still have an important traffic coming from the old peers: ~120Mbps and 12k packet/s of blocked udp traffic (and yes, that is autonomi, most of them comes from hetzner and are trying to access ports that hosted the nodes).
Incoming traffic was ~120Mbps and 13k packets/s just after stopping the nodes, but what’s worse is I still have a good portion of it trying to access nodes that were stopped 5 days ago…
This is a serious problem, and it seems the traffic won’t stop unless all my old peers are stopped (which might be never).
Is there any way to mitigate this?
Is this a bug in antnode? (people are reporting no shunned nodes, which might lead to the problem).
Yes, that is something I ad others have observed in the past. I stopped my 50 nodes on a RPi4 on Wednesday and there is still a small amount of traffic trying to get to it. Which is slowly decreasing as nodes finally give up on it.
I don’t think there is a bug. It’s just that nodes seem quite forgiving of a peer disappearing and will keep trying for a long time. Which we benefit from when they go down for power outages, OS upgrades, network shenanigans, etc. We can’t have it both ways! Maybe the slider is too far in one direction though.
All of these sliders, tunables, and parameters should be statistically optimised by a neural network or regression statistical model that updates dynamically while the network is in progress. This kind of feedback loop will make the network extremely flexible and powerful.
Yes, but after the outtage, the node reappearing will make outgoing contacts, so it will anyway rejoin the network quite quickly. I don’t see any good reason to repeately try to connect an offline node.
Just add (well, most likely it is already there) a ‘last seen’ timestamp to node info and do not try to contact nodes not seen recently. No need to drop off them or stun or ban or anything like that. Just don’t try to connect to those.
It’s one thing to be forgiving for some hours (maybe up to a day), but after that, it should stop. I still have a lot of traffic directed to my older stopped nodes (almost a week now). That is wayyyy too much time
Seeing as they already know about this then I assume it is one of the things that will be sorted out when other more important issues are sorted out.
It has been brought up a few times now and has to do with tuning the shunning algorithm. But this affects other things as well. The more important issues of refreshing the routing table had a major successful change done this last update and still needs some more fine tuning and is more important but affects how shunning will be tuned. So it needs to be working well first