Node Manager UX and Issues

Josh · June 2, 2024, 1:23pm

On a rare occasion but often enough to ask, this happens. just randomly out of nowhere and then it moves on and starts the next just fine.

chriso · June 2, 2024, 2:01pm

It’s probably trying to connect to the RPC service too quickly before it has been initialised.

DavidMc0 · June 11, 2024, 11:54am

Does anyone else often get the problem with being stuck on ‘refreshing the node registry’ when trying to stop / reset / do anything else with safenode-manager?

So far I’ve been sorting this by reinstalling everything … I’m hoping there’s a better solution, so input would be much appreciated!

Edit: I’ve been advised it can take 30 minutes plus to get past this, but give it some time & hopefully it’ll sort itself out. I will give patience a shot!

storage_guy · June 11, 2024, 2:41pm

I believe that’s due to the number of nodes running. It should be a reasonable time and manageable for a couple of dozen nodes but safenode-manager wasn’t designed for hundreds of nodes.

How many did you have at the time?

chriso · June 11, 2024, 2:42pm

The registry refresh should not take too long, so something is happening here that shouldn’t be. I don’t think I’ve seen any reports about this before.

I think there would be two possible causes for this problem. As a first possibility, the refresh connects to each node’s RPC service. That service might be dead, or otherwise taking a long time to respond. We might need to put an explicit timeout on the connection. For the second possibility, the node manager attempts to determine if the process launched by the service is still running, and it does so using an external crate called sysinfo. The node manager calls a function on this crate which refreshes the system. It could be possible this would cause a stall, but I’d imagine this is doing the same kind of thing that something like top/htop does, so that shouldn’t be a particularly expensive operation.

I am still on holiday just now, but, if you encounter this again, I would see if you can try and determine if the node RPC services are still running. You can get the port numbers from the node_registry.json file, and maybe just try using netcat or something to see if the port is still responding.

DavidMc0 · June 11, 2024, 2:45pm

320 nodes, so yeah, this seems to be the issue. It did work eventually (~30-40 minutes perhaps), but I might have been quicker to rebuild the system.

I was restarting the nodes because after quite a while running, memory and CPU usage was growing significantly.

After restarting, it was all back to a lower level, so it may be that restarting nodes every day or two helps with resource management… though ideally this won’t be necessary in time as the software improves.

chriso · June 11, 2024, 2:49pm

That is a lot of nodes, but, I still don’t think it should take that long. I often test the node manager on a small Vagrant box with 50 nodes, and even there, the refresh is only taking something on the order of 3 to 5 seconds.

Even with a large number of nodes, if it’s taking minutes, something is wrong.

riddim · June 11, 2024, 4:05pm

I don’t think this is scaling linearly… I’m not running more than about 100 nodes and about there it starts to really take time…

chriso · June 11, 2024, 7:14pm

I understand, but intuitively, it doesn’t seem like there is a good reason for it to not scale linearly. It’s a sequential process, and at least on paper, each step of that process doesn’t really do very much. So there might be other things at play here that cause it to go non-linear. Putting some more logging into the system might help to identify what the bottleneck really is.

neo · June 12, 2024, 1:53am

320 nodes, that is going to take a long time from experience. node-manager take 5-10 seconds (some systems faster/slower) to refresh around 15-25 nodes. But more importantly the spike in CPU usage will cause more nodes to slow down even more in a rather cascading effect.

But I agree that 1/2 seems to be like Chriso says some node’s processing is causing significant delays.

Once the CPU spikes to 100% doing the RPC calls then linear is out the window on the 50th floor. And 320 is going to do that for sure. But doubt 1/2 hour and more likely as you say some node(s) are holding up the show as you said

For such a low cpu usage program the RPC are a HUGE cpu usage in comparison spiking cpu/thread usage to 100% for significant processing time

R43 · June 18, 2024, 8:09am

Windows 11 machine.
Scenario:
15 nodes working for last three days 24/7. --home-network all nodes added with --auto-restart
Last night pc crashed (not because of safenode or any of safe services but gpu drivers were obsolete).
After powering pc only 7 nodes started up.

Other nodes cannot be started with safenode-manager.
Tried to stop all nodes and start nodes again but when starting with --interval 30000 only 7 nodes starts, others are providing error:

R43 · June 18, 2024, 8:46am

Resolved:
Interestingly, defender (which is switched off) on my machine (after the initial scan upon restart) blocked only 8 safenodes (declaring it as trojan) but other 7 were okay according to microsoft

chriso · June 18, 2024, 10:36am

You said it was resolved. How so?

R43 · June 18, 2024, 10:51am

Two words: Microsoft Defender (i have paid version) so unable to kill it.

Josh · July 2, 2024, 10:23am

Any chance we can set log levels for the restart @chriso

chriso · July 2, 2024, 2:49pm

Hey, sorry, could you clarify this please? Is this another setting that is not being retained on an upgrade?

Josh · July 2, 2024, 3:02pm

I might just be slow, can we now set how many logs to keep via node-manager?

You said that you can make this available, perhaps you have and I just didn’t realize.

chriso · July 3, 2024, 12:40pm

Hey, sorry, it’s not available yet. Will try and do it ASAP!

Nigel · July 9, 2024, 1:30am

safenode1: The PID of the process was not found after starting it.
✕ safenode2: The PID of the process was not found after starting it.
✕ safenode3: The PID of the process was not found after starting it.
✕ safenode4: The PID of the process was not found after starting it.
✕ safenode5: The PID of the process was not found after starting it.
✕ safenode6: The PID of the process was not found after starting it.
✕ safenode7: The PID of the process was not found after starting it.
✕ safenode8: The PID of the process was not found after starting it.
✕ safenode9: The PID of the process was not found after starting it.
✕ safenode10: The PID of the process was not found after starting it.
Error: 
   0: Failed to start one or more services

Location:
   sn_node_manager/src/cmd/node.rs:759

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

I had nodes for sure running successfully on my laptop but since the laptop is closed so often, my nodes never had a chance to earn. Seems like a real bummer that a closed laptop can’t be useful.

I decided to make another attempt with the updated network to get my slightly outdated desktop iMac running nodes.

Anyone know how to solve this so I we can get more diverse everyday computers helping run this network?

neo · July 9, 2024, 4:31am

Closing your laptop puts it to sleep. Nothing can be running when asleep. Just like you cannot build things or be conscious

Its a fundamental issue with the way sleep mode works. To make it so nodes could run then its not sleep mode is it

BTW do a stop first then start might help with those errors

Topic		Replies	Views
Safenode program built from current github will not join the network (got the keys from logs) Support	6	61	December 1, 2024
Using node manager to run a local test network Development	20	615	February 6, 2024
Help with safenode manager on windows Support	10	209	August 15, 2024
Back again after another extended (non-testing) absence . . can't connect . Support	9	99	December 14, 2024
I am trying to join the comnet through powershell and I am getting a wierd error Support	20	907	February 20, 2022

Node Manager UX and Issues

Related topics