Again with the same errors in uploads…
Error:
0: Failed to upload chunk batch: The maximum specified repayments were made for the address: 1978a2(00011001)…
Again with the same errors in uploads…
Error:
0: Failed to upload chunk batch: The maximum specified repayments were made for the address: 1978a2(00011001)…
I started 30 nodes on my dell optiplex 8gbram with port forwarding and all good and earning, I also started 30 on my pi4 8gb ram with no port forwarding and had to shut them down they were running cpu 100%. Not sure if it’s to many for the pi4 so I am going to try again later with port forwarding to see if it was that that caused the cpu difference.
So, I tried safenode-manager upgrade
and that seemed to work. Nodes started and vdash could see some activity.
Then I thought it maybe best to wipe and start over, considering it is a new network:
paul@Vader:~$ safeup node-manager
**************************************
* *
* Installing safenode-manager *
* *
**************************************
Installing safenode-manager for x86_64-unknown-linux-musl at /home/paul/.local/bin...
Retrieving latest version for safenode-manager...
Installing safenode-manager version 0.7.5...
[00:00:01] [########################################] 5.39 MiB/5.39 MiB (0s)
paul@Vader:~$ sudo /home/paul/.local/bin/safenode-manager reset
╔═════════════════════════════╗
║ Reset Safenode Services ║
╚═════════════════════════════╝
WARNING: all safenode services, data, and logs will be removed.
Do you wish to proceed? [y/n]
y
╔════════════════════════════╗
║ Stop Safenode Services ║
╚════════════════════════════╝
Error:
0: missing field `home_network` at line 1 column 1577
Location:
/home/runner/work/safe_network/safe_network/sn_node_manager/src/cmd/node.rs:360
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
So, something doesn’t look quite right there.
After that, safenode-manager
starts acting up:
paul@Vader:~$ sudo /home/paul/.local/bin/safenode-manager status
Error:
0: missing field `home_network` at line 1 column 1577
Location:
/home/runner/work/safe_network/safe_network/sn_node_manager/src/cmd/node.rs:329
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
Note, deleting /var/safenode-manager resets things (I have a backup, if anything is of interest).
Is this what love on first sight feels like?
That worked! Thank you kindly for the advice.
This is how many times I have been considered bad since yesterday on a group of 50 nodes.
Few are repetitive offenders, seems every now and then a different node gets flagged for ConnectionIssue
.
My question is what is ConnectionIssue
& looking at the totals it is sporadic should this concern me or can I consider it “normal”
grep -r "us as BAD" /var/log/safenode/
/var/log/safenode/safenode35/safenode.log.20240510T000112:[2024-05-10T00:46:37.919862Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWRXsG4L93DVNEJQnwNndf1FCpE2uqfPxveNYgne1kDHG9 - ae981d49a454681ac72cd294a694948f38727aa48ff0850e44449c81f3e7bc2b) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode12/safenode.log:[2024-05-09T14:09:40.014041Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWC46ydZGsisTrQBzmWmhgb2a9D8M8QtAtq4Ht753CJewn - 1515db0ac87a9a7e86d254635ad405517dbc4aa6d6d6592f2425e13ff39b27d7) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode4/safenode.log.20240510T001301:[2024-05-09T23:32:09.162190Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWJr1NshdbHMQ3jPyjPMdbP1MScPNWfw3B7dZ2HC9XgxMf - ac14824c894b7b829c0b8acd076959dff658b7389399f2b3b5e8f28d13883db2) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode46/safenode.log:[2024-05-10T05:27:00.266405Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWSntyUtDGe1gmhBrDfD8hVfTUAq1tQ8hvrsZgJqPuEcuU - 5cfe4ad119e2ad5189bd8eac8228522643094034d0217dc70cd9aae2c4af4d1b) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode7/safenode.log.20240509T232312:[2024-05-09T14:56:52.159148Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWAA7oVQSqXpJNHASzH3sN25AuUjkFV9N3TpxwcYcvJ7YB - 5452c2b49354c00059d2aec4a8b5cdea4effd1c8433599d0b3ba6019e82b46ba) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode23/safenode.log.20240510T002754:[2024-05-09T14:12:30.127283Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWQVwutTRx1sf8RV3HFL6pD3L98ZjfbJ7MfVUesVder5dJ - 42ed24b1fc32ef351b449de8be341226810fb11a90ae74a7bb40b11d71d1f792) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode15/safenode.log.20240510T011336:[2024-05-09T14:10:59.145291Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWFEqNbhjJb6NRoPNihd8GgbHEuqhszuJC8Mx6SVWWEjDY - 7d661000807d1a6e4c13068ecdfe98d450c69df1297cebf54b8e1e41ceacec56) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode21/safenode.log.20240510T015802:[2024-05-09T14:09:54.193143Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWPe9UpgAZZw7xVVsEadVZ8xSugQcZzquNfXJh5mnmGUt1 - 61883929594a67ec0afb5fdd5d448ac351690052abddf817193b4472ddb23a32) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode21/safenode.log.20240510T015802:[2024-05-09T23:29:38.559254Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWAr8qJnbdXD5ycPjmLqNNL9pvGDLtGqzQ18funiAG9eig - 61cf2f2da22c96032e5a2785debe26bf11a6ccc92a55264b513646f44e83e039) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode40/safenode.log.20240510T023553:[2024-05-10T03:19:03.584469Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWAEbhJaxrJgRBe9tEEPi578kJMgWBzooHL2FxS2yP6Kjn - 59858ccda011083d0f9002e784a1887a1186182bd2bc724b0d58e68fcebf3bd4) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode14/safenode.log.20240509T223528:[2024-05-09T14:10:32.997797Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWSh9NaxgF8rvQVrEdYepxGzbfLaR1qKn1KqfY3hp7VCx3 - d7c4171291e25385e3b995e86a61c124f4ae6c20edcf4338323b9b7238a5c803) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode27/safenode.log.20240510T043353:[2024-05-09T23:08:04.525822Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWKMqEDJ23MhBEt5bChQe8HMz14Ggoecthms7aQ9yRArmP - 5a947b28491cfee2efca73413103ec729bd1fad4bcf27d5bc1eb7d3ae1627b65) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode25/safenode.log.20240509T213418:[2024-05-09T14:28:53.165166Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWQVwutTRx1sf8RV3HFL6pD3L98ZjfbJ7MfVUesVder5dJ - 42ed24b1fc32ef351b449de8be341226810fb11a90ae74a7bb40b11d71d1f792) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode2/safenode.log.20240509T220451:[2024-05-09T14:08:40.364863Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWA5QjzvbikjCPoQ9Eff13jkL5Wh4s3JvNA46AL8zDSruX - 4f533f883d1423a2461d140680541b0639873ed35d103548cc7f61506c46614c) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode2/safenode.log.20240509T220451:[2024-05-09T14:45:36.328381Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWSZMXT3zPLXThdaDuy4VGCH2zSVejDVJZRJAxF6Vo3gTY - 4ff1c50214f514d0aed74a4df7db6d3c64152cb9f6861c30dc726e93ce063971) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode36/safenode.log.20240509T230018:[2024-05-09T23:31:24.486598Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWM1mKnHJFvueQ1tkqQyBMLXC4J64kPzkAiYo5Dzcg8UZu - 2c86db776497bcc82f9cae167d6465ba46f47c4a1f80968900d80df203fecfae) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode6/safenode.log:[2024-05-10T02:54:21.535449Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWJdvM7TmwHrMUn5K7jNEUpKC7hbpxKiYJq7AfChTNzq26 - b65b7f687de36f3608c3bad64f0ecb1303b832f5c922ab1963287b6582eac160) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode10/safenode.log.20240510T023531:[2024-05-09T14:11:23.351847Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWC46ydZGsisTrQBzmWmhgb2a9D8M8QtAtq4Ht753CJewn - 1515db0ac87a9a7e86d254635ad405517dbc4aa6d6d6592f2425e13ff39b27d7) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode19/safenode.log.20240510T002402:[2024-05-09T14:10:32.991056Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWGRHQqXqQPU3RejT6AzAsZScqai5kDiUqtfoGaE1hLju9 - d41af865f0dde39f69ff9085348a5c7782bbd4dc309cb4156a1e2a0e557caf92) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode19/safenode.log.20240510T002402:[2024-05-09T21:48:02.096745Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWMzdNDgmmy9h1xW2g7VwMiiwRcqiANKRSs7JCyVMt6sqP - d7c1a4ae4b26c43674be51dee81eb11123b0bfd8e625b70f4b0d68620ceeb5ec) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode28/safenode.log.20240510T031552:[2024-05-09T20:49:14.522635Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWJdvM7TmwHrMUn5K7jNEUpKC7hbpxKiYJq7AfChTNzq26 - b65b7f687de36f3608c3bad64f0ecb1303b832f5c922ab1963287b6582eac160) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode18/safenode.log.20240510T010525:[2024-05-09T14:09:16.001131Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWBtmUHB1rcjmB9xLdeGTcKeUFDpT2ciwPX36uiq2MZuL8 - 39fea601310fec1fc4ac35e3e2e7b4dc34c3be75e8b2cebb024a0a45da215b4f) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode9/safenode.log.20240509T234224:[2024-05-09T14:08:47.850808Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWAFGFGesYP1HkRfRhsgYF1HhbuRJArLRd1c5d9c1TYimA - d5e4185a861ab4dfa099158d7e184695bc6fe8cc0d73b12fb875f3bdd5adce50) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode9/safenode.log.20240509T234224:[2024-05-09T14:09:13.554035Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWK62yFGxFZSBoDJELpGrn33VSAVzQwFedN9bzfJw6V6rq - d5e83924c732d28364f6856ed2932d57e0781ce852f0228709e3d334e729f20e) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode11/safenode.log.20240509T235614:[2024-05-09T23:29:39.192844Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWAXtUjrQsaGt3vBzprVfqXTjRvsa7oSC1LgDPa89S7tWN - 13837f91393ef1e722322115f567151d642a664a6096405e0ec1cc0ee6f96dd8) consider us as BAD, due to "ConnectionIssue".
/var/log/safenode/safenode33/safenode.log.20240510T021007:[2024-05-09T15:55:36.419051Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWCZTdRn9tLqafiWNMxmRJ96o2EywWhLeTKozXqbzm7izN - 2598c6898f2c26bf65c977d126f8778c7e67a29584f5cbd4a802c06b71140880) consider us as BAD, due to "ReplicationFailure".
/var/log/safenode/safenode33/safenode.log:[2024-05-10T07:28:53.597450Z WARN sn_networking::event::request_response] Peer NetworkAddress::PeerId(12D3KooWMwH1nMPyKH8TXBHWzTPWDkr5Y7vaqivEptHbPxrJ4rVb - 2531a40b6de4b2850d32e066cb95c1a271d0c0cd780385a3b1a6d65e9b8f0df5) consider us as BAD, due to "ConnectionIssue".
I updated sn_httpd (including docker) to use the latest libs.
docker pull traktion/i-am-immutable
docker run -p 8080:8080 traktion/i-am-immutable:latest /ip4/138.68.176.97/udp/54325/quic-v1/p2p/12D3KooWPRkTQreT34FD7DcXda26oDcssdY8JmCZGbxwwgnsc3bZ 6d70bf50aec7ebb0f1b9ff5a98e2be2f9deb2017515a28d6aea0c6f80a9f44dd8128aad4427271f22b4f1444119b3243adbd5c48ca15edd7ab3dd56e4017a82126a4d2b5fe49d53fcfac7beef03f748e
All is working well!
I have 40 nodes running on a RPi4 8GB with the ‘–home-network’ option and it’s still reporting as about 25% idle. So I wouldn’t want to be trying to do anything else on the Pi at the time but I’d say it’s fine to run that many. Maybe 50. But I need to keep some network bandwidth for household usage.
Regarding the 100% CPU - Did you start them with the ‘–interval’ option? I started mine with --interval 180000
for a 3 min delay between each one. It seems to be plenty to let them settle down.
Or were they running with 100% CPU all the time? Never seen that before.
I also have 1 node running on a VM without the ‘–home-network’ option and with port forwarding of course and it’s earning. So whatever the issue was last time that was causing my non home-networking nodes to not earn is gone.
yeah, shall not seeing that.
Could be.
highly suspicious is might be some different time zone
issue causing the timestamp
issue ?
would idealy if can have the log of the following node, to see what happened.
There is continuous connection issue, such as dial error due to denied or aborted.
if that’s the all among 50 nodes across almost 24 hrs, then that shall be acceptable ?
7.5 Hours of data collected (posting a few relevant / interesting panels).
Wallet Balances:
Group A: 10
Group B: 745
Group C: 1337
Additional Notes:
References:
Group A - with --home-network
Group B - with --port X
Group C - with --upnp
I wonder what more conclusions can be drawn with the above groups vs. the changelist that went into this testnet as oppose to the prior one with PunchNet.
yes that is all I got from 50 nodes over almost 24 hours, I thought it is acceptable but a second opinion is always good, Thanks Qi!
I started them both using @neiks script so that shouldn’t be a problem. I started them late last night so didn’t wait until they all launched, I watched the first 3 and cpu seemed similar to the other so don’t really know what’s happened. Only difference between them is the port forwarding so going to try again later with that.
It’ll be the monitoring script stop the nodes with the script and start them again the start up script will update the pi to the latest resources script that is less CPU hungry.
@josh made a newer version that trims about 20% off the cpu usage
On my end:
Note: Group A is 4x more CPU than Group B & Group C (though this is relatively all low % across the board).
Note: Group A & B seem to have more connected peers than Group B by 5x on average.
For the advanced node operators, Group B or Group C would be most optimal (at least from my perspective currently), but that doesn’t mean Group A is necessarily “super inefficient either” (on CPU). It all depends which flags you decide to run with ease on your home environment, and what is long term properly supported in your environment.
Overall, if I have to guess right now, UPnP while not the primary focus of this testnet, seems like a good happy medium for (complexity vs ease of use) many users without much node operators’ input required.
The other machine is fine though or is it just because it’s a bigger machine?
One of the upload failures just happened for me a short while ago:
paul@mini-vader:~/Downloads$ safe files upload -p -b 128 RustRover-233.15026.24.tar.gz
Logging to directory: "/home/paul/.local/share/safe/client/logs/log_2024-05-10_11-33-15"
safe client built with git version: 16f3484 / stable / 16f3484 / 2024-05-09
Instantiating a SAFE client...
Connecting to the network with 49 peers
🔗 Connected to the Network Chunking 1 files...
"RustRover-233.15026.24.tar.gz" will be made public and linkable
Splitting and uploading "RustRover-233.15026.24.tar.gz" into 1529 chunks
Error:
0: Failed to upload chunk batch: The maximum specified repayments were made for the address: eb5a87(11101011)..
Location:
/home/runner/work/safe_network/safe_network/sn_cli/src/files/files_uploader.rs:171
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
lmk if you would like the logs.
I just retried with default batch size, but got the same sort of error (but different address):
paul@mini-vader:~/Downloads$ safe files upload -p RustRover-233.15026.24.tar.gz
Logging to directory: "/home/paul/.local/share/safe/client/logs/log_2024-05-10_13-13-30"
safe client built with git version: 16f3484 / stable / 16f3484 / 2024-05-09
Instantiating a SAFE client...
Connecting to the network with 49 peers
🔗 Connected to the Network "RustRover-233.15026.24.tar.gz" will be made public and linkable
Splitting and uploading "RustRover-233.15026.24.tar.gz" into 742 chunks
Error:
0: Failed to upload chunk batch: The maximum specified repayments were made for the address: 757ce7(01110101)..
Location:
/home/runner/work/safe_network/safe_network/sn_cli/src/files/files_uploader.rs:171
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
EDIT:
I’m a dope… No idea how the keyfile got written as root? But forgot we run as safe user. Leaving the below in case someone has the same issue (bug? Never had it before with 100’s of nodes started, so probably me) and it’ll be searchable.
I have one node that wont start and can’t figure out what’s wrong. I added 50 nodes at the same time. (my previous attempt was because I was a dope and didn’t stop
the nodes before reset
with node manager. That got all sorted.)
This is after clean stop, reset, add, start with node-manager. The other 49 all spun right up. This guy is puzzling me.
mav@hetzsafe-01:~$ sudo safenode-manager start --service-name safenode30
╔═════════════════════════════╗
║ Start Safenode Services ║
╚═════════════════════════════╝
Refreshing the node registry...
Attempting to start safenode30...
Failed to start 1 service(s):
✕ safenode30: The 'safenode30' service has failed to start
Error:
0: Failed to start one or more services
Location:
sn_node_manager/src/cmd/node.rs:576
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
mav@hetzsafe-01:~$ sudo journalctl -eu safenode30
......
May 10 12:23:58 hetzsafe-01 systemd[1]: safenode30.service: Failed with result 'exit-code'.
May 10 12:23:58 hetzsafe-01 systemd[1]: safenode30.service: Scheduled restart job, restart counter is at 4.
May 10 12:23:58 hetzsafe-01 systemd[1]: Stopped safenode30.
May 10 12:23:58 hetzsafe-01 systemd[1]: Started safenode30.
May 10 12:23:58 hetzsafe-01 safenode[85149]: Error:
May 10 12:23:58 hetzsafe-01 safenode[85149]: 0: failed to read secret key file: Permission denied (os error 13)
May 10 12:23:58 hetzsafe-01 safenode[85149]: Location:
May 10 12:23:58 hetzsafe-01 safenode[85149]: sn_node/src/bin/safenode/main.rs:491
May 10 12:23:58 hetzsafe-01 safenode[85149]: Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
May 10 12:23:58 hetzsafe-01 safenode[85149]: Run with RUST_BACKTRACE=full to include source snippets.
May 10 12:23:58 hetzsafe-01 systemd[1]: safenode30.service: Main process exited, code=exited, status=1/FAILURE
May 10 12:23:58 hetzsafe-01 systemd[1]: safenode30.service: Failed with result 'exit-code'.
May 10 12:23:59 hetzsafe-01 systemd[1]: safenode30.service: Scheduled restart job, restart counter is at 5.
May 10 12:23:59 hetzsafe-01 systemd[1]: Stopped safenode30.
May 10 12:23:59 hetzsafe-01 systemd[1]: safenode30.service: Start request repeated too quickly.
May 10 12:23:59 hetzsafe-01 systemd[1]: safenode30.service: Failed with result 'exit-code'.
May 10 12:23:59 hetzsafe-01 systemd[1]: Failed to start safenode30.
mav@hetzsafe-01:~$ ls -al /var/safenode-manager/services/safenode30/
total 25428
drwxr-xr-x 4 safe safe 4096 May 10 12:23 .
drwxr-xr-x 52 safe safe 4096 May 9 15:26 ..
drwxr-xr-x 2 root root 24576 May 9 15:30 record_store
-rwxr-xr-x 1 root root 25994728 May 9 15:25 safenode
-rw------- 1 root root 32 May 9 15:30 secret-key
drwxr-xr-x 2 root root 4096 May 9 15:30 wallet
There was a breaking change in the latest safenode-manager
release. Old serialized registry files don’t have the home_network
field. So the reset would actually need to have occurred before getting the new node manager version.
It is actually possible to have default values for missing fields with respect to serialization/deserialization, so we will do that in future, because this issue has tripped up several people.
The full path worked!
Get your hand off my thigh!!!