Node Manager UX and Issues

I will try 3.19+ off Alpine via LXC route at home as well, and see if it generates the SystemD format or the OpenRC format.

Very happy to see it work on the baremetal RPI with Alpine, and that you can reproduce the issue on the Alpine 3.18 VM via vagrant! Keep us posted if you spot a fix for this, :smiley: .

No problem, very excited to make use of this for at least 1 safenode pid per LXC once the above fixes or ERs go in as well. Also, I am eager to try out auto updates of the safenode via this tool (in the near future).

Glad to see the safenode-manager being implemented as a daemon as well:

1 Like

I get a slightly different error when I tried to build on the Alpine VM:

  = note: /usr/lib/gcc/x86_64-alpine-linux-musl/12.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find crti.o: No such file or directory

I’ve not investigated yet, but I’m quite intrigued now to try and get a build on Alpine later.

cannot find crti.o: No such file or directory:

I don’t recall now, but it might have been solved by apk add musl-dev for me a while back.

This is my current rustc version:

rustc 1.76.0 (07dca489a 2024-02-04)

The issue was reproducible here on 3.19 version off Alpine LXC for me.


/usr/lib/gcc/x86_64-alpine-linux-musl/12.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lbz2: No such file or directory

For the above error message, compiling the entire safe_network repo with the extra RUSTFLAGS (see below) seems to make safenode-manager work without the linker error:

export RUSTFLAGS="-Ctarget-feature=-crt-static"

Since Rust 1.72+, seems they added:

- [Force all native libraries to be statically linked when linking a static binary]([rust-lang/rust#111698](https://github.com/rust-lang/rust/pull/111698))

It seems the extra flags above would force a dynamic binary with dynamic libs, which may not be intended here, but seems to get me past the linker error for now on Alpine 3.19 LXC.

safe-build-122:/.../safe_github/safe_network/target/release# ./safenode-manager --help
A command-line application for installing, managing and operating `safenode` as a service.

Usage: safenode-manager [OPTIONS] <COMMAND>

Commands:
  add      Add one or more new safenode services
  faucet   Run a faucet server for use with a local network
  kill     Kill the running local network
  join     Join an existing local network
  remove   Remove a safenode service
  run      Run a local network
  start    Start a safenode service
  status   Get the status of services
  stop     Stop a safenode service
  upgrade  Upgrade safenode services
  help     Print this message or the help of the given subcommand(s)

Options:
  -v, --verbose...  
  -h, --help        Print help
  -V, --version     Print version

FWIW, below were the size difference off the binaries with and without that flag:

Before:

-rwxr-xr-x    1 x     x      33945840 Feb 24 02:29 safe
-rwxr-xr-x    1 x     x      26805840 Feb 24 02:29 safenode
-rwxr-xr-x    1 x     x      27125920 Feb 24 02:29 safenode_rpc_client

Note: safenode-manager couldn’t be built originally.

After setting the above RUSTFLAGS:

-rwxr-xr-x    2 x     x      33581352 Feb 27 20:47 safe
-rwxr-xr-x    2 x     x      26433064 Feb 27 20:46 safenode
-rwxr-xr-x    2 x     x      26778832 Feb 27 20:47 safenode_rpc_client
-rwxr-xr-x    2 x     x      19656296 Feb 27 20:44 safenode-manager

However, safenode-manager even after compiling still complains about the following when using the add arg:

Error loading shared library libbz2.so.1: No such file or directory (needed by ./safenode-manager)
Error relocating ./safenode-manager: BZ2_bzDecompress: symbol not found
Error relocating ./safenode-manager: BZ2_bzDecompressEnd: symbol not found
Error relocating ./safenode-manager: BZ2_bzDecompressInit: symbol not found

Running apk add libbz2 on this new LXC fixed the above on the newly compiled binary!

got to get ready for this. @chriso any idea when the safe node manager will support a specific port range for the nodes ?

really want to be able to upgrade my nodes in the beta test

I should be able to get that feature in this week. I’m working on a big refactor at the moment so that I can treat both the node services and the faucet service in a uniform way, but when I’ve done that, I’ll take some time to fix all the smaller issues raised in here, including the port range.

5 Likes

Hey Chris, previously when we started a node it’s directory was the peer id, with the manager it is now safenode1 or whatever number it was started at. I can see why, as we discussed earlier in the thread with sequencing but I do not see a quick and easy way to now find that peer id, perhaps in the logs but that is inconvenient could it be stored within the parent dir along with record_store safenode safenode.pid secret-key wallet ?

1 Like

Hey Josh, you should be able to get it from the status command I think?

yes but that needs sudo right? trying to avoid sudo, ahh maybe not.

The status command doesn’t need sudo.

3 Likes

Just an update, because I said I would get those features/changes in this week.

I’m still in the midst of the refactor to treat all service types uniformly. This week I had to spend a lot of time dealing with an issue that came up in the version bumping process, and also spent most of yesterday rebasing in changes that Roland had made. We were both making pretty major changes to the node manager so the rebase was challenging.

I made good progress on the refactor today. Definitely anticipating completing it Tuesday/Wednesday next week, at which point I will address the issues raised here.

9 Likes

@chriso - I spent some time looking at Alpine LXC on my end, upgraded it to the latest edge. I overwrote the ::native() function to specify OpenRc as the ServiceManagerKind incase it wasn’t picking up the right system init manager, and I still didn’t have any luck in getting the OpenRC format going.

I then started experimenting with /sbin/init → /bin/busybox, but digging further there, I didn’t see an openrc-init though the init was running under pid 1, so I thought the issue was still on my end. Then I tried various adhoc configuration changes inside the LXC on my end, but I wasn’t able to get safenode-manager to put OpenRC format.

I then stumbled across the following lines of code, which seems to be a temporary work around:

For now, removing this temporary workaround override, allows the proper OpenRC format to be put on disk.

I can confirm after removing it, the safenode-manager under the alpha branch is able to do a status command against the newly created service name (safenode15).

safe-node-145:/.../safe_binaries# ./safenode-manager status 
=================================================
                Safenode Services                
=================================================
Service Name       Peer ID                                              Status  Connected Peers
safenode3          -                                                    ADDED               -
safenode4          -                                                    ADDED               -
safenode5          -                                                    ADDED               -
safenode6          -                                                    ADDED               -
safenode7          -                                                    ADDED               -
safenode8          -                                                    ADDED               -
safenode9          -                                                    ADDED               -
safenode10         -                                                    ADDED               -
safenode11         -                                                    ADDED               -
safenode12         -                                                    ADDED               -
safenode13         -                                                    ADDED               -
safenode14         -                                                    ADDED               -
safenode15         12D3KooWDPBeDHrp1KddjLVd4HyZrmNHjfv1wAYgc8Ah9v3KRZTb RUNNING               0

Start & Stop commands to safenode-manager also seem to be working:

safe-node-145:/.../safe_binaries# ./safenode-manager start --service-name safenode15
=================================================
             Start Safenode Services             
=================================================
Attempting to start safenode15...
āœ“ Started safenode15 service
  - Peer ID: 12D3KooWDPBeDHrp1KddjLVd4HyZrmNHjfv1wAYgc8Ah9v3KRZTb
  - Logs: /.../safe_node_logs/safe-node-145/safenode15
safe-node-145:/.../safe_binaries# ./safenode-manager stop --service-name safenode15
=================================================
              Stop Safenode Services             
=================================================
Attempting to stop safenode15...
āœ“ Service safenode15 with PID 788 was stopped

Note: I am not too familiar with the rust language, so I don’t know the proper solution here, but hopefully the above info helps. Thanks!

5 Likes

Thanks for your efforts on the investigation.

There’s definitely something to look into here, although I’m not sure if the override explains how it correctly generated an OpenRC configuration on Alpine 3.19 on my Pi. Will need to dig into it further next week.

2 Likes

@chriso I have a real bonehead issue going on here, user error to put it politely.

Long story short, what I think happened here is that I manually removed node dirs from /var/safenode-manager.
I do have a satisfactory excuse that I will not bore you with.

My issue now though is this.

=================================================
                Safenode Services                
=================================================
Service Name       Peer ID                                              Status  Connected Peers
safenode5          12D3KooWRoEnzS1LkVkrjjA6k4AtNUgozTFF69KavfURMUapsGUr STOPPED               -
safenode9          12D3KooWEX1df7fD7DGwoL7gctL3VBVGKkEg2R8kw7C4YayDhQtc STOPPED               -
safenode10         12D3KooWCr6c9LHrJh7h8y7z9JoyimAdigDojiZESjGsVNcj8rPw STOPPED               -
wyse3@wyse3:~$ sudo env "PATH=$PATH" safenode-manager remove --service-name safenode5
=================================================
           Remove Safenode Services              
=================================================
Error: 
   0: Warning: Can't execute disable on the unit file path. Proceeding with the unit name.
      Failed to disable unit: Unit file safenode5.service does not exist.
   0: 

Location:
   sn_node_manager/src/service.rs:167

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Hey, right ok, thanks for letting me know. We would need to take into account that it might be possible that users would modify state that the node manager is keeping track of.

I’ll get back to you with everything you’d need to delete to get you back to a fresh state.

4 Likes

Oh it will happen, it will happen a lot

Ever seen someone who knows just enough to get into a lot of trouble when they try to do something in their filesystem. Eventually the software will have to have error recovery when users delete things while the client or node is running. Its not just fixing edge cases but making sure errors from users fiddling in files will not break the software, but the software will produce the error response and gracefully exit.

4 Likes

Hey Josh,

To clean things up manually, you need to remove the services and the directories where the node manager is saving state.

To remove the services:

sudo systemctl disable --now safenodeX
sudo rm /etc/systemd/system/safenode*
sudo systemctl daemon-reload

Then delete these directories:

sudo rm -rf /var/safenode-manager
sudo rm -rf /var/log/safenode
3 Likes

Chris you are a legend, thank you!

3 Likes

Just a quick progress update on this. My refactor was merged, but part of the purpose of that refactor was to be able to upgrade the faucet in our testnets. So, I need to deal with that now before coming to the issues raised in this thread, but I’m hoping I will get to them either this week or early next.

6 Likes

Migrated my comment from the weekly update to this topic, as it seems more appropriate:

Spotted the safenodemand as a new binary being built in the main branch. Is this short hand for ā€˜safe node manager’ + d for daemon? If so, for me, I keep reading it as safe node demand, and get slightly thrown off from the spelling, :smile: . I might have to get used to the abbreviated version (safe node man d).

Throwing out a possible suggestion here, manager is often abbreviated as mgr, so would safe node mgr d = safenodemgrd (safe node manager daemon) be more clearer here?

3 Likes

You can see a little bit of discussion on it here.

I don’t have a strong preference on man vs mgr. If enough people wanted that, I’d be happy to go with it. Remember though, the safenode-manager binary will also be renamed to the same effect.

2 Likes