ReplicationNet [June 7 Testnet 2023] [Offline]

My node have 1024 chunks too. Looks like limit.

upd. Yeah. [2023-06-10T09:43:00.157900Z WARN sn_record_store] Record not stored. Maximum number of records reached. Current num_records: 1024

4 Likes

All this sounds really promising… Well done all, strong foundations!

9 Likes

With nodes full we should start to expect some failures.

9 Likes

Of course, there will be problems with storing fresh data.
But old data loss is related to crashes I think.
Which may (or may not) be related to reaching nodes capacity.

5 Likes

Nice, I didn’t recall zgrep though I knew there would be a better way, there always is on Un*x.

You need -e though or you are trying to match that whole string :man_facepalming::

zgrep -e "5a1c9e\|73b703\|6be651\|63594e\|ab8031" safenode.log*.gz

Anyway, I can now confirm I don’t have those values in any logfile for my two nodes.

4 Likes

probably because people start getting confident to upload more larger files now.

Those files also includes datamaps, so size does being different.

it actually contains all disk operations, and seems even include write to network when chunk being queried.

yeah that’s a mystery. may need some further look into the sysinfo crate that used to collect the statistics

7 Likes

I had reasonable values for total_mb_read for my node and I shared charts previously.
So it is not a problem to get this value go up in general.
However it may be a problem for specific node.
Theoretically, as well as no-chunks nodes, there may be no-GET nodes.

1 Like

definitely your node is holding one copy of it.
Is the node still running?
The log shows it responded the query just after the upload, but nothing happened for later on queries and replications.
bit strange.

It’s actually not a simple situatioin to answer.
A chunk copy could missing due to following reasons:
1, due to the nodes full, hence some nodes actually don’t have the hard copy of it, even they are supposed to
2, the detection of dead peer is not being carried out in an earger way. i.e. it could took certain time for a dead peer to be detected by it’s neighbours, in case not having enough data flying across.
3, the query command sent by the client itself taking too long time to reach the holder, which the query got timedout first before a valid copy reached.
4, we also find some communication issue within the current replication flow, which may cause some holes to the replication, i.e. some holder may miss its copy due to interruptted data transmission. It shall be resolved with the latest main.

Hence, the loss of a chunk could be any combination of above, and not involving any node crashing.
some we have resolved, some need some tweak to the configuration, some need some rules/work to be done to improve.

10 Likes

It looks like my safenode_20 locked up around 17:33 last night


safe@ubuntu-2gb-nbg1-1:~/safe_vault/logs$ tail -f ~/.safe/node/safenode_20/safenode.log
[2023-06-09T17:33:43.534043Z INFO sn_networking::event] identify: Sent { peer_id: PeerId("12D3KooWKV25Bq6wprCTLsBokEi6RNfkaGdtwXCpV1tgrzyoA3U8") }
[2023-06-09T17:33:43.534471Z INFO sn_networking::event] Connection closed to Peer 12D3KooWAaGBHcDeKaVGkKJid37M9NmHQpeJpTVKGfWRQ4KhJbUT(0) - Listener { local_addr: "/ip4/128.140.8.137/tcp/41361", send_back_addr: "/ip4/209.97.190.37/tcp/50784" } - Some(KeepAliveTimeout)
[2023-06-09T17:33:43.913780Z TRACE sn_logging::metrics] {"physical_cpu_threads":1,"system_cpu_usage_percent":99.86169,"system_total_memory_mb":2021.5972,"system_memory_used_mb":2021.5972,"system_memory_usage_percent":100.0,"network":{"interface_name":"eth0","bytes_received":7866810,"bytes_transmitted":5077811,"total_mb_received":66698.49,"total_mb_transmitted":41357.836},"process":{"cpu_usage_percent":1.7980638,"memory_used_mb":156.90956,"bytes_read":33484800,"bytes_written":49152,"total_mb_read":2852.2373,"total_mb_written":131.60039}}
[2023-06-09T17:33:44.968863Z DEBUG sn_networking::event] KademliaEvent ignored: UnroutablePeer { peer: PeerId("12D3KooWDWW7tAZCFVPEiiotXXkUEAEqt3oxWeQbnr5xNu4efc3P") }
[2023-06-09T17:33:44.992649Z INFO sn_networking::event] identify: Sent { peer_id: PeerId("12D3KooWMup2LqQFavPhsmbVKP3wxfuvDuwD4EXJqKNfDkvJwHez") }
[2023-06-09T17:33:45.095266Z INFO sn_networking::event] identify: received info peer_id=12D3KooWKV25Bq6wprCTLsBokEi6RNfkaGdtwXCpV1tgrzyoA3U8 info=Info { public_key: Ed25519(PublicKey(compressed): 8fa01c8a7138f5e054887803fbdd48f613435a75d43d3086c8c3b32f6b75), protocol_version: "safe/0.1.1", agent_version: "safe/node/0.1.1", listen_addrs: ["/ip4/144.126.230.246/tcp/42015", "/ip4/127.0.0.1/tcp/42015", "/ip4/10.16.0.46/tcp/42015", "/ip4/10.131.0.42/tcp/42015", "/ip4/10.131.0.42/tcp/42015", "/ip4/144.126.230.246/tcp/42015"], protocols: ["/safe/1", "/ipfs/kad/1.0.0", "/ipfs/id/1.0.0", "/ipfs/id/push/1.0.0", "/libp2p/autonat/1.0.0"], observed_addr: "/ip4/128.140.8.137/tcp/41361/p2p/12D3KooWNYKkBQdHDpNC1HBnDNz11QGRcHC9Xai7vHsyejGtFKgL" }
[2023-06-09T17:33:45.448023Z INFO sn_networking::event] identify: received info peer_id=12D3KooWG9cM7YyTpZ59ikHg4uRFrPXqEpjpBgiDg9h92XekciH8 info=Info { public_key: Ed25519(PublicKey(compressed): 5e14305db350513557e057325f1d027c385e923c16a1b61ebfa075e6b9b3), protocol_version: "safe/0.1.1", agent_version: "safe/node/0.1.1", listen_addrs: ["/ip4/127.0.0.1/tcp/43797", "/ip4/159.65.21.240/tcp/43797", "/ip4/10.16.0.60/tcp/43797", "/ip4/10.131.0.56/tcp/43797", "/ip4/159.65.21.240/tcp/43797", "/ip4/10.131.0.56/tcp/43797", "/ip4/10.16.0.60/tcp/43797"], protocols: ["/safe/1", "/ipfs/kad/1.0.0", "/ipfs/id/1.0.0", "/ipfs/id/push/1.0.0", "/libp2p/autonat/1.0.0"], observed_addr: "/ip4/128.140.8.137/tcp/41361/p2p/12D3KooWNYKkBQdHDpNC1HBnDNz11QGRcHC9Xai7vHsyejGtFKgL" }
[2023-06-09T17:33:45.659807Z INFO sn_networking::event] identify: Sent { peer_id: PeerId("12D3KooW9zUfYarWcbEMfv9K9rVvyv89EuKUjPNpQaLWoi8GHpky") }
[2023-06-09T17:33:45.842163Z INFO sn_networking::event] identify: received info peer_id=12D3KooW9zUfYarWcbEMfv9K9rVvyv89EuKUjPNpQaLWoi8GHpky info=Info { public_key: Ed25519(PublicKey(compressed): 297275e52bfe7920d2562d327de2fc86147fc6cb132271446b3bc35633ba2), protocol_version: "safe/0.1.1", agent_version: "safe/node/0.1.1", listen_addrs: ["/ip4/10.131.0.49/tcp/35995", "/ip4/127.0.0.1/tcp/35995", "/ip4/138.68.150.176/tcp/35995", "/ip4/10.16.0.53/tcp/35995", "/ip4/10.131.0.49/tcp/35995", "/ip4/138.68.150.176/tcp/35995"], protocols: ["/safe/1", "/ipfs/kad/1.0.0", "/ipfs/id/1.0.0", "/ipfs/id/push/1.0.0", "/libp2p/autonat/1.0.0"], observed_addr: "/ip4/128.140.8.137/tcp/56888" }
[2023-06-09T17:33:48.133764Z INFO sn_networking::event] identify: Sent { peer_id: PeerId("12D3KooWDReet2veUuhhkvjXtYUCRkshKhBfWgEV87QDwUbab8Hc") }
1 Like

even no more metrics logs, that’s bit strange.
metrics logging is running in a totally separate spawned thread, which shall never got hang.
could that because the max log files reached?
it’s currently set as 1000, and shall including those compressed files as I understand.

2 Likes

I have these logs

safe@ubuntu-2gb-nbg1-1:~/.safe/node/safenode_20$ ll
total 74964
drwxrwxr-x  3 safe safe     4096 Jun  9 17:32 ./
drwxrwxr-x 32 safe safe     4096 Jun  9 14:00 ../
drwxrwxr-x  2 safe safe    20480 Jun  9 17:33 record_store/
-rw-rw-r--  1 safe safe 38318080 Jun 10 15:19 safenode.log
-rw-rw-r--  1 safe safe  1481413 Jun  9 14:10 safenode.log.20230609T141025
-rw-rw-r--  1 safe safe  1597522 Jun  9 14:21 safenode.log.20230609T142112
-rw-rw-r--  1 safe safe  1538273 Jun  9 14:29 safenode.log.20230609T142943
-rw-rw-r--  1 safe safe  1611213 Jun  9 14:38 safenode.log.20230609T143834
-rw-rw-r--  1 safe safe  1572130 Jun  9 14:48 safenode.log.20230609T144830
-rw-rw-r--  1 safe safe  1562292 Jun  9 14:58 safenode.log.20230609T145811
-rw-rw-r--  1 safe safe  1577526 Jun  9 15:08 safenode.log.20230609T150818
-rw-rw-r--  1 safe safe  1611112 Jun  9 15:17 safenode.log.20230609T151733
-rw-rw-r--  1 safe safe  1589835 Jun  9 15:27 safenode.log.20230609T152713
-rw-rw-r--  1 safe safe  1611796 Jun  9 15:37 safenode.log.20230609T153730
-rw-rw-r--  1 safe safe  1625356 Jun  9 15:47 safenode.log.20230609T154719
-rw-rw-r--  1 safe safe  1605623 Jun  9 15:57 safenode.log.20230609T155724
-rw-rw-r--  1 safe safe  1625477 Jun  9 16:06 safenode.log.20230609T160630
-rw-rw-r--  1 safe safe  1641200 Jun  9 16:16 safenode.log.20230609T161633
-rw-rw-r--  1 safe safe  1637701 Jun  9 16:25 safenode.log.20230609T162525
-rw-rw-r--  1 safe safe  1589142 Jun  9 16:34 safenode.log.20230609T163417
-rw-rw-r--  1 safe safe  1632472 Jun  9 16:43 safenode.log.20230609T164313
-rw-rw-r--  1 safe safe  1595049 Jun  9 16:50 safenode.log.20230609T165002
-rw-rw-r--  1 safe safe  1562064 Jun  9 16:56 safenode.log.20230609T165646
-rw-rw-r--  1 safe safe  1520332 Jun  9 17:02 safenode.log.20230609T170202
-rw-rw-r--  1 safe safe  1626569 Jun  9 17:08 safenode.log.20230609T170847
-rw-rw-r--  1 safe safe  1648748 Jun  9 17:17 safenode.log.20230609T171703
-rw-rw-r--  1 safe safe  1602155 Jun  9 17:24 safenode.log.20230609T172440
-rw-rw-r--  1 safe safe  1626816 Jun  9 17:32 safenode.log.20230609T173233

Running through vdash, I can see quite a few of my nodes are showing “stopped”.

19 of 30 appear to be stopped with last log timestamps >30 min in the past.

I did briefly run out of disk space but I solved that quickly by dumping ssome data for upload.
I have a new 20GB volume attached , now I just need to transfer logs etc across to free up space for chunks.
Then I intend shutting down nodes one by one and setting up new nodes with logging to the new volume to save max space for chunks on the original volume.

3 Likes

So what you are saying is basically that network can initially have less copies than needed. Looks like we need tools, which can check how many nodes have copies.

But I was thinking more about scenario where initially network had all 8 copies, but then they disappear.

Did you read my messages about OutgoingConnectionError and corresponding memory spikes?
May you just look if node which I mentioned is functioning or not?
It will instantly provide clue about involvement of node crashes.

4 Likes

Chunks are rotting more:

600MB file round 2: expected 1243, retrieved 1239
600MB file round 3: expected 1243, retrieved 1207

1.2GB file round 2: expected 2233, retrieved 2224
1.2GB file round 3: Took too long to finish, I had to go.

My smaller and earlier files downloaded fine. I guess do to size it’s much less likely to miss a chunk.

4 Likes

Not the case for me.
5f7ff8546e1d226d2e436b657a611a9c32f7f6a58024a8340a4b141d3f2e66aa (3367 bytes) uploaded 1 hour after network was launched is broken now.

2 Likes

I checked my uploads again and now see some errors


Downloading file "how-protonmail-lost-the-public-trust-it-needs-to-do-business.html" with address 4ebd77dd0109b38b9d18fd181ca4d3ecfce2d30020e6772a4a6c72c50827f54b
Did not get file "how-protonmail-lost-the-public-trust-it-needs-to-do-business.html" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [ae8b62(10101110).., a9edda(10101001)..]..

Downloading file "dam7.jpg" with address 3df28693e292a1baa319aae2bce594a42e9a0bead8f6f98385adbf35df2adaf3
Did not get file "dam7.jpg" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [8095bc(10000000).., 771b40(01110111)..]..
Downloading file "ukrleak5.jpg" with address 265a0a4541d185fc676a6709eb3d5e4f6ff49d19ab4af49a1595ade7744e990e
Did not get file "ukrleak5.jpg" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [0980ad(00001001).., d1b5e0(11010001)..]..
Downloading file "rubeard.jpg" with address 2a761d07d01573e0c376f9dd461dfdd1b0b1ffa76df63cedaf2bb467b8866953
Did not get file "rubeard.jpg" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [b671c3(10110110).., 54ba18(01010100)..]..
Downloading file "bake1-s.jpg" with address fad2332d4ffd0000587165996796fe0077e7d82576618ef12a43f7e33b66fd4a
Did not get file "bake1-s.jpg" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [917b7d(10010001).., f77e9b(11110111)..]..
Downloading file "ukrleak1.jpg" with address 6aef3f09e9576265ea0b09da3d485cdc8202f7ef72733ef5e8c987b4671a562c
Did not get file "ukrleak1.jpg" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [7fdf69(01111111).., 5aa5b5(01011010)..]..
Downloading file "offensejune4-s.jpg" with address 7c7d72b753e9b5d90a012bda36dec2ebaf6652f47c6705bee1875e7e3d11a06b
Did not get file "offensejune4-s.jpg" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [49d7d5(01001001).., 11b5a4(00010001)..]..
Downloading file "ukrprog3.jpg" with address f8020047a3ddd3e344105248c44516949cec41a1642a7c765191741f1d525ea9
Did not get file "ukrprog3.jpg" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [c59d65(11000101).., 57c521(01010111)..]..
Downloading file "dam7-s.jpg" with address fd700c086e6fe980bce9a133ce6c91fb8d2e8963de84e32002ddb8563ed8c34c
Did not get file "dam7-s.jpg" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [57e160(01010111).., 18a28f(00011000)..]..
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b751a72e0b200c" with address b014c0e8dc830af6c9567000c0a1bf8913e1c69b9f9d475c966b38cce32dc3fa
Did not get file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b751a72e0b200c" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [1efda6(00011110).., 112faa(00010001)..]..
Downloading file "another-dubious-leak-promotes-the-nord-stream-cover-up-story.html" with address cb24cd7d0c736983f582bba9fbecc9ca7bf803c997c9a02c45a18caa4c8936a0
Did not get file "another-dubious-leak-promotes-the-nord-stream-cover-up-story.html" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [017b62(00000001).., cb7ea6(11001011)..]..
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b75182c958200b" with address 4d5917b743bf7936df0958cf797efaf46b68f008370dcad05d1a4bc9faec906a
Did not get file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b75182c958200b" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [6caa5f(01101100).., 3e3de2(00111110)..]..
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b751a72e4d200c" with address b741b39eeafab12f40b9246a01ea21ed93b66cc361caf47e9c716f622fcfbca0
Did not get file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b751a72e4d200c" from the network! Network Error Record was not found locally.
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539a8a6200d" with address f69603f748feb01534cf96a81fae7d32ab6b41f7d2973ccd9c92bacb8e583d7d
Did not get file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539a8a6200d" from the network! Network Error Record was not found locally.
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539a890200d" with address d7ec81b0c22b2a8ddcb7052d14acb5168d155a7d2214e1dd2657365c8201d740
Did not get file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539a890200d" from the network! Network Error Record was not found locally.
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539ae0a200d" with address ee26de1ffc4c485d2f4f7b6735f7558e31dd2f98e0fab2832b99a9c1108c7d9e
Successfully got file no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539ae0a200d!
Writing 155316 bytes to "/home/safe/.safe/client/downloaded_files/no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539ae0a200d"
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b75182bcc0200b" with address 0d1b1881904226265fcd5e0afe8dc6ce115d3578408f83a00b6695197d14d5cd
Did not get file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b75182bcc0200b" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [1d232f(00011101).., 0b167f(00001011)..]..
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539a825200d" with address 4733d4220f2bd36d4177085a9b83583a95cd1202a0eadf98156148d8fd7256a9
Successfully got file no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539a825200d!
Writing 157079 bytes to "/home/safe/.safe/client/downloaded_files/no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539a825200d"
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b75182ca9f200b" with address dfd30471781826e1c482d0f6ddb91951c17128926e837a67230416582416aab7
Successfully got file no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b75182ca9f200b!
Writing 96377 bytes to "/home/safe/.safe/client/downloaded_files/no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b75182ca9f200b"
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539ae18200d" with address 7843be22a0f437fd12493b7baa295bc7a6e44ea4d86c51a3063c1fb81ffe36de
Did not get file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b68539ae18200d" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [366225(00110110).., 53ce16(01010011)..]..
Downloading file "no-such-propaganda-delusions-will-not-win-the-war.html?cid=6a00d8341c640e53ef02b751a72c41200c" with address cf7c050b586645c036797ae6c33740fb2c3ab0075205a881c2243a92349953de
Downloading file "ukraine-sitrep-leaked-briefings-holding-roads-split-training.html?cid=6a00d8341c640e53ef02b7519f52d6200c" with address a6473734984a7b043cc07c73cc552a5e2a6988708e74d789ff077e62979bb13e
Did not get file "ukraine-sitrep-leaked-briefings-holding-roads-split-training.html?cid=6a00d8341c640e53ef02b7519f52d6200c" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [bee3fc(10111110).., 028527(00000010)..]..
Downloading file "ukraine-sitrep-leaked-briefings-holding-roads-split-training.html?cid=6a00d8341c640e53ef02b7519f52d6200c" with address a6473734984a7b043cc07c73cc552a5e2a6988708e74d789ff077e62979bb13e
Did not get file "ukraine-sitrep-leaked-briefings-holding-roads-split-training.html?cid=6a00d8341c640e53ef02b7519f52d6200c" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [bee3fc(10111110).., 028527(00000010)..]..

Maybe 5% of my uploads are missing chunks now.

4 Likes

partially yes. I also mean a copy may loss along the time and evetually got all 8 vanished.

yeah, could be.

The OutgoingConnectionError could be related to the data loss during transmission mentioned previously. And there is work in the main trying to address it.
The memory spike related could just because at that time it’s accepting/replicate chunk copies from/to other peers.

But the nodes having no logs after some time do look suspicious, and will have some further investigation on it.

10 Likes

Updates to the existing dashboard since last post:

  • Break down of different types of messages in the safenode log
  • Tagged OutgoingConnectionError in the error bucket panel, though I haven’t found other error messages outside of these so far in the logs to classify or expose to the ‘PID - SAFE Logs Error Statistics’ panel
  • Distinct count of all unique PeerIDs interactions with this node
  • Rate at which this node is communicating with different PeerIds on a 1 minute interval (based on just unique count of PeerIds detected from the logs over time)
  • All data above is now flowing in real-time from the container & safe_node logs :smiley: .

Note: Some panels are set to less than 4 days etc, because they don’t contain data from the container from the time of safenode pid spin up, as some of the features were added later after safenode pid was started.

More observations:

  • Very interesting to see breakdown of different forms of safenode messages fairly even and distributed over time (stacked bar chart), including the upper max count per bar per 1 hr is relatively stable, :hugs:
  • Outgoing Connection Errors did have an inner ‘timeout’ messages on my node’s logs, but I didn’t expose that here, and bundled it up within the context of a ‘OutgoingConnectionError’ as is
  • Possible correlation between Detected Dead Peer along with Outbound Connection Error as well based on the side by side comparison above? :thinking:
  • Peer connection closed vs Peer connection connected messages in stacked bar chart above, the closed had slightly higher count per hour than the connected (orange vs red color in the ‘PID - Safe Logs Statistics Per Hr’).
  • Curious why the total PUT requests messages are way higher than actual Chunks Written in terms of count? Almost an exact multiplier of 8x here? Might be expected? :thinking:
  • Is the throughput for uploads and downloads being noted by others as a bit slow because of the sheer amount of connections closed as well as connected to the same peer IDs repeatedly on the server side or entirely not related as its a still work-in-progress? :thinking:
  • Not sure what to make out of the 4th and 5th image above other than the node is communicating with many peer IDs, and is not in isolation with the network peer nodes, and likely working as expected as part of the distributed network? :smiley:
  • For the 6th image, x-axis is the count of times off how many # of unique PeerIDs (y-axis) connected or closed in that bucket for all Peer connected & closed messages across all PeerIDs logged with a connected or closed message. Anything here stands out as a concern, outside of a handful of unique Peer IDs being at the very end of the x-axis (upper range) ? :thinking:
  • Peer Info Received vs Peer Info Sent messages counts are nearly equal, nice!
  • Confirmed Chunks Written messages in the logs and their total count do align and are equal to the # of total files in the record_store folder, :hugs:
  • Did we end up having around 2148 unique nodes or more in this testnet? Seems at least this node contacted or engaged with that many unique PeerIds, :smiley:

Post Updates after Initial Post:

  • Changed the panel for the top left to state ‘PID - Safe Process Activity’, and re-updated image.
  • Added a 3rd image to highlight PUT requests vs Chunks written messages over time.
  • Added a 4th & 5th image to highlight the top 250 PeerIDs and # of times connected and closed over time from my node’s logs.
  • Added a 6th image to highlight the distribution count of # Unique PeerIDs vs # Connection connected & closed against them.
  • Added Percent column on the pie charts for 4th & 5th image.
23 Likes

This is excellent. How can I use this dashboard for my own nodes?

9 Likes

Is that it broken now?
Just uploaded a dir of 70 logs, only succeeded in getting 2 of them without error

safe@ubuntu-2gb-nbg1-1:~/safe_vault/logs/safenode_1$ safe files download
Removed old logs from directory: "/tmp/safe-client"
Logging to directory: "/tmp/safe-client"
Current build's git commit hash: cea98bcca21d075970e4fae72090da49a2af348f
⢀ Connecting to The SAFE Network...                                                                                                            The client still does not know enough network nodes.
🔗 Connected to the Network                                                                                                                    Trying to download files recorded in uploaded_files folder
Loading file names from index doc "file_names_2023-06-11_00-24-24"
Downloading file "safenode.log.20230609T144626" with address 0cba979ba6dd7f390f0972b194c5d584e2c8df329b5dd047d1dd7379f9011a40
Did not get file "safenode.log.20230609T144626" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T170732" with address 11be4b0e414c1ce60837b738c2ab02cf80a183e84297c2e24066e094aba2775b
Did not get file "safenode.log.20230609T170732" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T151324" with address 33f1018c9a480e5a2da9fcc431ef6e2e90aa4aae7207272a6ca16a6ce3ef4ae7
Did not get file "safenode.log.20230609T151324" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 2, missing [95ed1e(10010101).., 35b357(00110101)..]..
Downloading file "safenode.log.20230609T143639" with address dcaf4913d9b1877e68a27a775b8c2a2558ac6a7fdb8a9de60e7ab42439e3067b
Did not get file "safenode.log.20230609T143639" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 2, missing [2266c6(00100010).., 0e11aa(00001110)..]..
Downloading file "safenode.log.20230609T181041" with address 22597a40ed2904cefe0f92ec5fd022da99d05220315b41d6013e9eac2ae83082
Did not get file "safenode.log.20230609T181041" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T212943" with address 359f05747860c677ff8136784fecfa5ac44fdebade30bbbb1e67070e1afbaed3
Did not get file "safenode.log.20230609T212943" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T191410" with address b1c900048876222cef4646c234a6ee503375a756efee5a3b0ae19d37f716b463
Did not get file "safenode.log.20230609T191410" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T180151" with address 81aff54030e016db2f2532582e888c8be78400098df3cc7ceefcc785fb56999b
Did not get file "safenode.log.20230609T180151" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 3, missing [1ba505(00011011).., 050bb1(00000101).., 1d5851(00011101)..]..
Downloading file "safenode.log.20230609T141028" with address 69643c8d2eaf43d441a1001fd873868c16b2d5bce224d7d062a9091fff966bf0
Did not get file "safenode.log.20230609T141028" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 1, missing [4d7951(01001101)..]..
Downloading file "safenode.log.20230609T212229" with address 3595c7a7bcdcef777b3eaab3956255b73ce850f85d86ebc77bd92b79bc6db53f
Did not get file "safenode.log.20230609T212229" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T220712" with address 864b998e393dd5c29d7bfb97132d86420dd5fc5bc8872259edaef9f98a326602
Did not get file "safenode.log.20230609T220712" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T213809" with address 1c9d772d5a412b043481ac7777bce4c1d1ee25308976343d403add60ac884303
Did not get file "safenode.log.20230609T213809" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T171639" with address 5c152ba03e922b575c9edc5202268433f754b2689aaa493398ce6f97d45f2112
Did not get file "safenode.log.20230609T171639" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T183339" with address 54e6c9cf1a91f92eb4f9dcd59457a4f40da702173fbbfa005612631f55ea647e
Did not get file "safenode.log.20230609T183339" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 2, missing [859d18(10000101).., 7f3062(01111111)..]..
Downloading file "safenode.log.20230609T210759" with address 18b45256f13987238acf897bb5547cf8e916557c410d8038d614f4aa74ff42f9
Did not get file "safenode.log.20230609T210759" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T172523" with address cff7c5fbaca9f27b49390109e2deac5cad5c6b5cff0e2f8deb16d2c3016f4c76
Did not get file "safenode.log.20230609T172523" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T141912" with address 22d6588a7f1888b87ac2f61fd19bc47c1659d62d4c21f19a005c5422bc2c99ab
Did not get file "safenode.log.20230609T141912" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T203534" with address b755657942c7334e5930c8110e9992d491f7caa9ee66be3a17c0f7585da301fc
Did not get file "safenode.log.20230609T203534" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 1, missing [9d85c2(10011101)..]..
Downloading file "safenode.log.20230609T190556" with address 54663204b9a9718ff6d15291e6c90a1ce4e2aa503d08c1370a09c8797129f6f5
Did not get file "safenode.log.20230609T190556" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T175234" with address 636f885ac464f906e9213fb510f6df2c58b90fc6a803ee02be2c0ae1778f131e
Did not get file "safenode.log.20230609T175234" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 2, missing [0f904b(00001111).., 35dff6(00110101)..]..
Downloading file "safenode.log.20230609T214513" with address 331f4e03898bcd38877a269ce74227ee58ef305838f4b30084fd4bf82814e2b4
Did not get file "safenode.log.20230609T214513" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T184136" with address c8e30352f443cf66f3a7f537b51e403cd0b51b25e9c155852ed2a709509fa704
Did not get file "safenode.log.20230609T184136" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T223045" with address cd36debf43d4f7310999d00b800f68aceeb41a4d403bccf9a3f8178a89afae59
Did not get file "safenode.log.20230609T223045" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T204250" with address 94f4b22c977bb6a4d2483bdd0e4908e6ada777a0ccd76608cb872acdc0642e98
Did not get file "safenode.log.20230609T204250" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T224610" with address 514d2d683f90283197fade28c66ccd370158794abeb2a5a76b0c22f4a49975c6
Did not get file "safenode.log.20230609T224610" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T161543" with address baafec08b0eb423bead789af17c4390cc0343e2372f51af86ec8096c2e6331d2
Did not get file "safenode.log.20230609T161543" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 2, missing [a971f8(10101001).., 524f50(01010010)..]..
Downloading file "safenode.log.20230609T181926" with address 0f9e3c1cfb8d58adcdd7377560491beb80268a869ca7dba82446c10377367bc9
Did not get file "safenode.log.20230609T181926" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T210016" with address e1c85d98aa5e6255e3ac2d1405bd0227e70761270e1ba896dcdc1bd6e748ed37
Did not get file "safenode.log.20230609T210016" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T202745" with address c9317b70afa9e62bae3c6b18db7d6e573bec9755e82b580f3f4098055720acee
Did not get file "safenode.log.20230609T202745" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T220035" with address bfac2b76337ce0cccb87aff6896ce1cd4879db13b74340fef5b4464e1ea3b595
Did not get file "safenode.log.20230609T220035" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 1, missing [caa0f2(11001010)..]..
Downloading file "safenode.log.20230609T173423" with address 8c506909c9d70b6f67c3b2609ba25913854cb73ac9faac7fd992a1d0c33dfacc
Did not get file "safenode.log.20230609T173423" from the network! Network Error Record was not found locally.
Downloading file "safenode.log" with address 0278c4c7c2be03e6354313f3a8fada963e9f40bc5bf6adb6fbce0a2a88b7d614
Did not get file "safenode.log" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T145550" with address cb89a320f7c73823694bf8167c6494e7de4442437281f89d8020f45e8a43957c
Did not get file "safenode.log.20230609T145550" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 1, missing [413905(01000001)..]..
Downloading file "safenode.log.20230609T201938" with address a577777749bc31ad35d77340cd978568ee11246054bc13a2ec8f6ca686007152
Did not get file "safenode.log.20230609T201938" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 2, missing [c953e5(11001001).., 191ca8(00011001)..]..
Downloading file "safenode.log.20230609T221444" with address d2c57659fe09b51c865dcf189fa405597651e30cca9c33d397ba815ecb14a9b2
Did not get file "safenode.log.20230609T221444" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T164023" with address fa70b7b8aaef4a4379844fa13d4bd043f2f5627deb0df07f081da3bca3ab4da3
Did not get file "safenode.log.20230609T164023" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T160659" with address d4d55685ef750b962da919a0d213247682ee3c45f8f52155c0d0f94ec23c8324
Did not get file "safenode.log.20230609T160659" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T182613" with address 23b87d69879f89b523195471264830401e09987e435258817d813ed814309d18
Did not get file "safenode.log.20230609T182613" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T225335" with address b14d81eec415a84e4ec9399c87da8b7413065ef20727d7c2dc5776786a85694d
Did not get file "safenode.log.20230609T225335" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T211512" with address 312a1727ec20e0f5409990095b53cab6b6d6d7fca08566a3205abff57254f256
Did not get file "safenode.log.20230609T211512" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T165834" with address 883cdcdd950f85d443c135f53c24ec9945a7ac2ff9311ab29611124fd4f28dd0
Did not get file "safenode.log.20230609T165834" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T215215" with address ebd5aaf44175efc130daa7d7e1e6054c2afc58f012eebbca1b81a02e8ad44d6c
Did not get file "safenode.log.20230609T215215" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T195343" with address a30bebd2e53da6d37e379481d66ad1c9520039b761820b0d87e4b2b87143c63f
Did not get file "safenode.log.20230609T195343" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T164910" with address d967485849b0f4db06db484dd86e5c8ba56e7b6e3c9014728d28ea47ced198f5
Did not get file "safenode.log.20230609T164910" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T153109" with address f88735064eae3951854371e305160dfbf904a118492e60c6300189c9d80c9335
Did not get file "safenode.log.20230609T153109" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T155653" with address cde1d64f48ed3361f2cff63b68f10e8aee29b5d164dbb13798e8a6c03e6accd1
Did not get file "safenode.log.20230609T155653" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T193842" with address 67b125c1dede28704ebf7f94bf20cf6f0c5bffa05e287cfdb093cc2bd0596b01
Did not get file "safenode.log.20230609T193842" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T200256" with address 982d8b09ba176aa062a269612d12be3f4d7c62f3d8814cf9e57f25ef1d29afc6
Did not get file "safenode.log.20230609T200256" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T230106" with address 78980e4b690ecc028d4155fb9c6c934caa00525570b648dafbac07a604790cfd
Did not get file "safenode.log.20230609T230106" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T201146" with address 6d585fbaff75486db810fb2887b64b72b40a85a8acf9d7d5a235b3800baa2e47
Did not get file "safenode.log.20230609T201146" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T174321" with address f4391aa5b0a53af18827f610918956638ad3c5af7a4cbaaa2357bb297703fed1
Did not get file "safenode.log.20230609T174321" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T184936" with address d2944ad0bb0b3516234890f1e0b9430341faee357cf109592c795bacd73ee138
Did not get file "safenode.log.20230609T184936" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T205148" with address 97feb92b4e68b0c9550439c48cd5d0e7da91a8a288be1b6344a51951f75dd442
Did not get file "safenode.log.20230609T205148" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T162357" with address 26caa9efc0dd877f1d4924569557d255c063664ffae944c5ca4b75e7d2c9987b
Did not get file "safenode.log.20230609T162357" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 1, missing [9c1f42(10011100)..]..
Downloading file "safenode.log.20230609T192220" with address 25ead3437c988305a4e8deafab2b6e8cb295218afbd064f882d0e814a9b46a92
Did not get file "safenode.log.20230609T192220" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 0, missing []..
Downloading file "safenode.log.20230609T222219" with address 3af6b4c88352d813d886eab181e6c485250acc2eb05935e21d8379ab4d70ddbb
Did not get file "safenode.log.20230609T222219" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T153939" with address 5cc928d4eb6897fe242ccb8b533e2760b063247c3c7406d2c87500fc547de4ec
Did not get file "safenode.log.20230609T153939" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T142747" with address b835f42ceeb4c27df8ca9d0f75d2760b041f11214f4a0ceeaf35f845a6d86176
Did not get file "safenode.log.20230609T142747" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T150409" with address 9c47b07aada4890396453e0aa3b540f4e015a698efdae7e8e017da2081a59f65
Did not get file "safenode.log.20230609T150409" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T140127" with address 4b5cee0ed1111f687a92bacbbdd9bcf49d335543ea304c224cdec2512197a181
Did not get file "safenode.log.20230609T140127" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T152119" with address 6761facfb6a22c06ceec32d573bcb786c8ab92159ee8e24f53dfe51d89c9e4eb
Successfully got file safenode.log.20230609T152119!
Writing 1542587 bytes to "/home/safe/.safe/client/downloaded_files/safenode.log.20230609T152119"
Downloading file "safenode.log.20230609T223829" with address a25f5f3ef18a68297f1e7c553f41ede5ec0032282499bd818b76bb92a3839a76
Did not get file "safenode.log.20230609T223829" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T194653" with address e7366575d7fcea43a1edbda7019ccd8cea6b02b24ac7a6d4f8c14bc038c2c9fb
Did not get file "safenode.log.20230609T194653" from the network! Chunks error Not all chunks were retrieved, expected 4, retrieved 3, missing [920137(10010010).., 4b714a(01001011).., 54ce2a(01010100)..]..
Downloading file "safenode.log.20230609T185800" with address 48cd9b4c0d45f0a2b0919c63895412b94984e202e6c6d196b5ab963c73d72f8f
Successfully got file safenode.log.20230609T185800!
Writing 1627457 bytes to "/home/safe/.safe/client/downloaded_files/safenode.log.20230609T185800"
Downloading file "safenode.log.20230609T154803" with address 5190dcc7716bdf0c317ab641a55db1c87d760e1266054a445d783a28e17bb6b2
Did not get file "safenode.log.20230609T154803" from the network! Network Error Record was not found locally.
Downloading file "safenode.log.20230609T163154" with address 6e1dc656ec0096b8adc94a42d72e76283f1eb9f79e752ef4258a2081b7992b9b
Did not get file "safenode.log.20230609T163154" from the network! Chunks error Not all chunks were retrieved, expected 3, retrieved 0, missing []..
Downloading file "safenode.log.20230609T193015" with address 9e7b8086b7c5c175b55fd90c8f33412be0d43ac6f0e5c7ca3e67d05498202c30
Did not get file "safenode.log.20230609T193015" from the network! Network Error Record was not found locally.
4 Likes

Do nodes delete their chunks in some circumstances?
I do not see such behaviour with my node.
If they do not delete chunks, they should still be holding them.
Except for the cases when nodes are crashed.

This is what they are related to: [2023-06-09T18:07:43.349575Z TRACE sn_networking::event] Detected dead peer PeerId("12D3KooWHs2FuFcuSHtkt1KdCAKDnXp35EDbkrcx559rp7TMrj9n")
And looks like you not only do not want to check this node, but also do not want to talk about it. Got it.

My node is full, it can’t accept.
Copying to other peers would result in disk reads, I published chart for reads too, there is no correlation of memory_used_mb with total_mb_read.