Bash scripts for managing safe nodes on Linux

I had the wrong version of the proto files. Everything work now, such a game changer.

In addition to installing grpcurl, the proto files below are needed to make grpcurl happy when you make a call.

From maidsafe/safe_network Main branch:

sn_protocol/src/safenode_proto/safenode.proto
sn_protocol/src/safenode_proto/req_resp_types.proto

The node needs to be started with this added: --rpc 127.0.0.1:<some_port>

The call will then become:
grpcurl -plaintext -proto safenode.proto 127.0.0.1:<some_port> safenode_proto.SafeNode/KBuckets

You can use: NodeInfo, NetworkInfo, NodeEvents, RecordAddresses, KBuckets, Stop, Restart, Update, UpdateZLogLevel. as described in the safenode_proto file.

2 Likes

I just set the rpc port in safenode-manager when starting. Does the node need the loopback address as well?

Good to see it working.

Can you link the actual link to the github page with the current versions of the files for others.

Done! Plus more: snnm v0.2.0, a.k.a the ā€œneo editionā€ in your honor, is in my repo now.
As usual it is experimental, use-at-your-own-risk, and takes some effort to get setup but it works great for me.

It’s able to:
-e: flag to enable rpc on nodes when starting, restarting, or exchanging node processes
-L: the rpc enabled but slower cousin of -l to list all nodes incl. actual peer count and uptime

The rpc port number is derived from the node port: node_port - NODE_BASE_PORT + RPC_BASE_PORT.

sample output:
| port 5xxxx | pid 271192 | peer-id 12D3xxxx | v."0.108.2", 29 peers, uptime "43483" s

Installation of grpcurl and friends (for installation into your $HOME directory):

cd $HOME
wget https://github.com/fullstorydev/grpcurl/releases/download/v1.9.1/grpcurl_1.9.1_linux_386.rpm
sudo rpm -i grpcurl_1.9.1_linux_386.rpm
sudo dnf -y install grpcurl jq
wget https://raw.githubusercontent.com/maidsafe/safe_network/main/sn_protocol/src/safenode_proto/safenode.proto -O safenode.proto
wget https://raw.githubusercontent.com/maidsafe/safe_network/main/sn_protocol/src/safenode_proto/req_resp_types.proto -O req_resp_types.proto

We can clean up the snnm installation by moving the files into the same directory as used by the other safeup applications.

After moving snnm, and *.proto to $HOME/.local/bin/ I just type snnm without the preceding ./ This only works for version 0.2.9 and above.

  • Installation of snnm 0.2.9 and above into the $HOME/.local/bin/:
    Rocky Linux 9.4, bash
cd $HOME/.local/bin/
wget https://raw.githubusercontent.com/drirmbda/node-toolbox/drirmbda-dev/snnm -O snnm
sudo chmod +x snnm
wget https://raw.githubusercontent.com/maidsafe/safe_network/main/sn_protocol/src/safenode_proto/safenode.proto -O safenode.proto
wget https://raw.githubusercontent.com/maidsafe/safe_network/main/sn_protocol/src/safenode_proto/req_resp_types.proto -O req_resp_types.proto

cd $HOME
wget https://github.com/fullstorydev/grpcurl/releases/download/v1.9.1/grpcurl_1.9.1_linux_386.rpm
sudo rpm -i grpcurl_1.9.1_linux_386.rpm
sudo dnf -y install grpcurl jq

snnm -v

(edit: correction of wget URL of proto files.)

1 Like

Nah im convinced something is backwards here. How am I suppose to know an incoming port from another user when everyone is setting their own range? I only know what im setting. Not what incoming connections will be? What am I missing lol.

snnm version 0.2.9 has this kind of output. All you need to know. connected peers real-time, number of records, and forwarded earnings. Plus totals for each server.

snnm -a 53344 -b 53349 -L
..................................................
| port  53344 | pid   312230 | peer-id 12D3KooW..1 | v."0.108.2"  |     33 peers |     99 records |    "18728"s uptime | 
| port  53345 | pid   315238 | peer-id 12D3KooW..2 | v."0.108.2"  |    258 peers |    100 records |    "18589"s uptime | 
| port  53346 | pid   299563 | peer-id 12D3KooW..3 | v."0.108.2"  |    247 peers |    131 records |    "19282"s uptime | fwd_nanos: 20
| port  53347 | pid   317531 | peer-id 12D3KooW..4 | v."0.108.2"  |     28 peers |    133 records |    "18477"s uptime | fwd_nanos: 10
| port  53348 | pid   307574 | peer-id 12D3KooW..5 | v."0.108.2"  |     58 peers |    106 records |    "18891"s uptime | fwd_nanos: 10
| port  53349 | pid   301291 | peer-id 12D3KooW..6 | v."0.108.2"  |     83 peers |    116 records |    "19200"s uptime | fwd_nanos: 10
Total connected peers (for nodes with rpc port only): 707
Total records (for nodes with rpc port only): 685
Total forwarded nanos: 50
snnm execution is done. In some cases you need to [Ctrl]+[c] to get back to your command prompt.

The node sets the port it will be listening on. Either a random one or the one you specify.

The process is

  • your node communicates with another node (or relay if home-network) and this causes
    • your node advertising to the router what port it is listening for a response (the listening port)
  • the router translates the addresses and sends the packet to the receiver
  • the receiver knows where to reply to from the packet’s header
  • the receiver builds response and sends back to the node using the info gained from the request packet
  • your router receives this packet and knows from the sender info that the packet is to be sent to your node on its listening port
    • the router keeps a table that allows it to match up sender/receiver addr&port
  • your node listening on that listening port is then able to receive the packet.

Now the listening port is either set to the port you specify or will pick a random port number to set it to.

So its not just you who knows what port number you are setting it to, but your node and your router are very talkative about the incoming ports

3 Likes

Ok I think im well confused. So ive reset everything. safenetnode-manager reset then I recreated my node list with safenode-manager add --owner .user --home-network --count 20 --node-port 21000-21019 --rpc-port 31000-31019 --peer /ip4/46.101.80.187/udp/58070/quic-v1/p2p/12D3KooWKgJQedzCxrp33u3dBD1mUZ9HTjEjgrxskEBvzoQWkRT9 - FYI ive tried this with and without the --home-network which should not be needed.

my current state of the ufw status is

21000:21150/udp            ALLOW       Anywhere                  
22/tcp (v6)                ALLOW       Anywhere (v6)             
21000:21150/udp (v6)       ALLOW       Anywhere (v6) 

my current state of the user ports and listeners is something like this;

Netid                   State                    Recv-Q                   Send-Q                                         Local Address:Port                                      Peer Address:Port                  Process
udp                     UNCONN                   0                        0                                                    0.0.0.0:21016                                          0.0.0.0:*                                               
udp                     UNCONN                   0                        0                                                    0.0.0.0:21017                                          0.0.0.0:*                                               
udp                     UNCONN                   0                        0                                                    0.0.0.0:21018                                          0.0.0.0:*                                               
udp                     UNCONN                   0                        0                                                    0.0.0.0:21019                                          0.0.0.0:*                                               
udp                     UNCONN                   0                        0                                              127.0.0.53%lo:53                                             0.0.0.0:*                                               
udp                     UNCONN                   0                        0                                          192.168.30.5%eno1:68                                             0.0.0.0:*                                               
tcp                     LISTEN                   0                        4096                                               127.0.0.1:9050                                           0.0.0.0:*                                               
tcp                     LISTEN                   0                        128                                                  0.0.0.0:22                                             0.0.0.0:*                                               
tcp                     LISTEN                   0                        4096                                           127.0.0.53%lo:53                                             0.0.0.0:*                                               
tcp                     LISTEN                   0                        128                                                127.0.0.1:31006                                          0.0.0.0:*                                               
tcp                     LISTEN                   0                        128                                                127.0.0.1:31007                                          0.0.0.0:*                                               
tcp                     LISTEN                   0                        128                                                127.0.0.1:31004                                          0.0.0.0:*                                               
tcp                     LISTEN                   0                        128                                                127.0.0.1:31005                                          0.0.0.0:*                                               
tcp                     LISTEN                   0                        128                                                127.0.0.1:31002                                          0.0.0.0:*                                               
tcp                     LISTEN                   0                        128                                                127.0.0.1:31003                                          0.0.0.0:*       

My router is setup exactly the same way to access other services… unsure best way to test each layer, but now im getting 0 connections on, anything at any point.

the fixes are supposed to remove the need for the peer option and that peer probably doesn’t exist anymore anyhow

And i hope that .user is changed to your discordID and not the user option

home-network doesn’t need the ports specified as they really are not listening anyhow. The connection is done via normal communications where the node communicates with relay first opening the connection pathway.

Only need the ports when not specifying home-network and port forwarding

The peer option is prob the biggest problem here, with the update those old peers will be gone

1 Like

Good lord split me. I had a inclination it was the peer not existing lol. Yeah I figured the other stuff was the case. And yes, That is the case. Thanks very much il wait for this update then. Excellent.

1 Like

update is here. Just don’t use the --peer

The launcher and manager will download the updated node s/w automatically

snnm is now v0.3.2. Versions 0.3.x use a new approach to significantly increase speed and to use fewer resources. No more lsof and netstat!

edit: Jumped once more, to v0.4.x now, with metrics server support added…

Link to my development branch for the latest version: snnm

To install it on Rocky Linux 9.4 (not tested on other distributions yet):

cd "$HOME"
rm snnm
wget https://raw.githubusercontent.com/drirmbda/node-toolbox/drirmbda-dev/snnm -O snnm
echo "WARNING: snnm is experimental software, only for people who know what they are doing. Review the code before using at your own responsibility."
chmod +x snnm
wget https://github.com/fullstorydev/grpcurl/releases/download/v1.9.1/grpcurl_1.9.1_linux_386.rpm
sudo rpm -i grpcurl_1.9.1_linux_386.rpm
sudo dnf -y install grpcurl jq
wget https://raw.githubusercontent.com/maidsafe/safe_network/main/sn_protocol/src/safenode_proto/safenode.proto -O safenode.proto
wget https://raw.githubusercontent.com/maidsafe/safe_network/main/sn_protocol/src/safenode_proto/req_resp_types.proto -O req_resp_types.proto

Note: Sometimes, when I commit changes, I forget to set back the default owner value to OWNER_DISCORD_ID=ā€œnoneā€ on line 6, to run nodes without setting any Discord ID. You can also change it to your own ID to avoid needing to specify it every time using -d.

3 Likes
  • Adding --metrics-server-port <portnumber> at node start enables the metrics server. You can then grab a snapshot using wget for example and grep something interesting.

Rocky 9.4, bash

__PORT=<portnumber>  # e.g. __PORT=12300
wget -q http://127.0.0.1:"$__PORT"/metrics -O "/tmp/safemetrics-$__PORT"
cat "/tmp/safemetrics-$__PORT"  | grep -oP "^sn_networking_estimated_network_size [0-9]*"
1 Like

We have logs, rpc, metrics server. Now, what would be a good script to assess the node health based on a combination of factors, summarized into an ā€œOKā€ stamp of approval?

1 Like
  • Cap the number of running nodes while you are starting nodes.

Obviously not so good when you are using safenode-manager or launchpad, but works great with snnm. (As with anything using the kill command, test and review if this will work for you.)

Rocky 9.4, bash

TARGET=50   # allowed number of nodes running at any time
while true
do
  [ $(ps -A | grep safenode | wc -l) -gt "$TARGET" ] && kill $(ps -A | grep safenode | head -2 | grep -o "^[ ,0-9]*")
  ps -A | grep safenode | wc -l ; ps -A | grep safenode | grep def | wc -l
sleep 2
done
  • Get list of active IP addresses from which a node is receiving data by safenode PID.

I found a way to get active connections or connection counts without using costly RPC calls. To get accurate values the time to monitor a port should be 30 seconds, but 0.5 s is good enough to get a snapshot so that the reported values are a lower limit. This way we can assess if the node is connected.

Rocky Linux 9.4, bash

#Example for PID 1334809 with monitoring window of 0.5 seconds:
DATA=$(timeout 0.5 strace -tfp 1334809 -s 1 --trace=network 2>&1 | grep recv | grep "iov_len=94208")
echo "$DATA" | grep -o "sin_addr=inet_addr(\"[0-9,.]*\"" | sort | uniq -c #| wc -l

This is integrated in the snnm tool (v0.4.9+), which is frankly getting quite ahead of the capabilities of official tools and is also more transparent, easier to debug, to tweak, and it scales really well. But of course snnm is limited to RHEL OSes, or at least remains untested on other distributions.

4 Likes
  • Identify nodes without good peers to contact, thus needing a restart.

This approach uses the log output only and probably is more reliable than counting connections.

Rocky Linux 9.4, bash

#The following is run from the logs directory of a node
cat "safenode.log" | grep -e "Skip bad_nodes check" -e "Performing"
#If the last item contains "Skip" then the number of nodes "in the RT" is "too small" and we should do a new bootstrap. A bootstrap shows up in the log as "Performing" a bootsrap.

This is incorporated in snnm v0.5.0+, which also can launch vdash by specifying a port number instead of a path and many more improvements.

4 Likes

snnm v0.5.3 includes an improved health check and it prints a reassuring ā€œseems okayā€ for nodes that appear to be happy. This still needs improvement…

Instead of using systemd and monitoring, I run this periodically:

./snnm -M -1 -w 30 -X
./snnm -l | tee snnm-l-$(date +%Y%m%d-%H:%M |tr -d '[ ]')
2 Likes