NoClientGossipDiscovery [24/11/23 Testnet] [Offline]

Shu · November 24, 2023, 6:42pm

Little late to the party, but working on getting a node fired up, .

I am blanking out here (forgot what I did last testnet), but for those who are starting up safenode manually with args + compiling it from source with other features, what is at least 1 PEER_ID that should be set an env. var or passed into safenode pid?

I didn’t see one mentioned in the OP.

Is there a static URL that contains the network PEER IDs published by MaidSafe’s nodes that can be used to bootstrap the node? i.e. location of the network_contacts file? (if that still exists)?

Vort · November 24, 2023, 6:55pm

I’m almost sure they are here:
https://sn-testnet.s3.eu-west-2.amazonaws.com/network-contacts

Shu · November 24, 2023, 6:57pm

Excellent! I knew it existed but wasn’t able to find the URL when searching in the forums.

It be nice if this handy static URL is also part of the OP (future testnets) template (instructions).

Josh · November 24, 2023, 7:49pm

My latency is typically around 9 ms, it stayed there for the first part of this test but since the memory started climbing so has latency.

The amount of nodes you run is going to affect this, so a nice indicator.
Current Latency to 8.8.8.8: 52.308 ms

Command for average latency:
ping -c 4 8.8.8.8 | tail -1| awk '{print $4}' | cut -d '/' -f 2

Vort · November 24, 2023, 7:55pm

Most likely, latency increased because of increase in network usage and not because of RAM.
But RAM and network usage may increase because of some common event, yeah.

Pings are processed at some low level in operating systems (kernel or so) and such processing not requires much RAM.

Josh · November 24, 2023, 7:56pm

Yes, was not saying it was due to RAM just started at around the same time.

happybeing · November 24, 2023, 8:31pm

$ safe -V
sn_cli 0.86.20

Shu · November 24, 2023, 9:20pm

Node is up and running for roughly past 1.5hrs:

Features added:

Transfer rate of Gossip Sub Messages from metrics endpoint being received.
Fixed network traffic rate (correct units) on the derivative function

Note: The areas where this is ‘No Data’ particularly under ‘PUT Request’, ‘Chunk Written’, and ‘Chunk Deleted’, is likely due to logs being changed on disk, and I would need to circle back and re-update the parser logic on my end (TBD). In addition, the ‘Unique Peer IDs’, ‘Unique Request IDs’, and ‘Unique Chunk Addresses’ also broke from my parser logic back-end, due to the log format changing (TBD).

However, the ‘PUT Records OK’, ‘SN Networking Records Stored’, ‘SN Replication Triggered’ etc are all coming from metrics endpoint.

Few Observations:

Every-time there is a large burst of chunk being stored, the # of peers connected remain higher than previous baseline, and do not seem to decrease.
In the case above, the network traffic also nearly doubles / (linear scaling?), and remains at the higher levels due to likely more peers being actively connected to the node.

image1871×839 64 KB
The LibP2P identify sent and received periodically also increase (new maximum) once more chunks are stored, including a rise in their average baseline.

image1870×836 38.5 KB
Gossip Sub Messages received in this case seem to remain steady and flat (reducing in overall maximum), but the base minimum seems to be increasing (the rate) from starting initial point.

image1866×842 41.3 KB
Not sure why LIBP2P - Observed Addresses & Connected Nodes show no data from metrics endpoint (TBD).

Zooming in on the above observations here:

Further observations after initial post:

I can’t tell if the chunks that arrived are purely due to replication or new data being stored, as the SN Replication Triggered is showing 0 and the SN Wallet Balance also shows 0. I am guessing this is existing replicated data, and the SN Wallet Balance is accurate (being at 0.0), and likely the SN Replication Triggered is from the perspective of my node, and not chunk replication being received that was triggered from other nodes? Hmmm.

Is the memory rise due to the # of peers that increase post a burst of chunks stored in waves, or a combination of extra peers actively connected as well as a potential list of chunk addresses stored on disk (kept in-memory)?

storage_guy · November 24, 2023, 10:22pm

Is something wrong? I had a node running but due to some spectacular idiocy trying to start some on another machine I logged onto the wrong one and killed it. Now I can’t get a single one running on anything in AWS or at home. I’m getting this in the logs:-

[2023-11-24T22:18:11.820741Z TRACE sn_node::replication] Not having enough peers to start replication: 1/20

I didn’t change anything in between.

There are a lot of Failed to dial messages near the start of the log.

I’m using the correct version and it downloaded the list of peers.

EDIT
I see this in the logs:-

[2023-11-24T21:53:11.820059Z INFO sn_node::node] Node has been subscribed to gossipsub topic 'TRANSFER_NOTIFICATION' to receive network royalties payments notifications.

So that sounds good but it’s the only indication I’m connected to anything.

Shu · November 24, 2023, 10:26pm

I had this issue too when I grabbed the 1st PEER_ID in the network-contacts URL noted earlier above by Vort for a manual kick start off safenode pid.

I ended up setting export SAFE_PEERS= on the shell to the 2nd PEER ID in the list above, and that was able to get me to bootstrap properly.

safe-build-122:# echo $SAFE_PEERS
/ip4/46.101.2.43/tcp/35669/p2p/12D3KooWBeQ6RLvhsprzvG6xFuvJQc9FmjXd89toqPU1K4mXxMaA

Southside · November 24, 2023, 10:29pm

I am seeing similar.
I have just killed them all as the cloud instance was asking for a reboot.

I’ll start from scratch again in a min and report back.

storage_guy · November 24, 2023, 10:46pm

Thanks! I’ve tried a random few of them though and got the same result.

Can anyone supply a working peer they’ve started?

You can get it from the first safenode.log with:-

grep 'Local node is listening' safenode.log

and also supplying your external IP address if you are comfortable doing that.

Less to worry about if the node is cloud based.

loziniak · November 24, 2023, 10:58pm

An up to date URL should always be in the code here:

github.com

maidsafe/safe_network/blob/main/sn_peers_acquisition/src/lib.rs#L21


      
          use color_eyre::eyre::Context;
          use color_eyre::{eyre::eyre, Result};
          use libp2p::{multiaddr::Protocol, Multiaddr};
          use rand::{seq::SliceRandom, thread_rng};
          use tracing::*;
          #[cfg(feature = "network-contacts")]
          use url::Url;
          
          #[cfg(feature = "network-contacts")]
          // URL containing the multi-addresses of the bootstrap nodes.
          const NETWORK_CONTACTS_URL: &str = "https://sn-testnet.s3.eu-west-2.amazonaws.com/network-contacts";
          
          #[cfg(feature = "network-contacts")]
          // The maximum number of retries to be performed while trying to fetch the network contacts file.
          const MAX_NETWORK_CONTACTS_GET_RETRIES: usize = 3;
          
          /// The name of the environment variable that can be used to pass peers to the node.
          pub const SAFE_PEERS_ENV: &str = "SAFE_PEERS";
          
          #[derive(Args, Debug)]
          pub struct PeersArgs {

Shu · November 24, 2023, 11:00pm

Nice! I fully agree with you .

I was searching for network_contacts and not network-contacts, whoops!

I should have tried more permutations but I have added the URL in my scripts as a side comment so I don’t forget the format of the filename in the URL.

On side note, I don’t think Github used to require an account to just search the repo before… sigh, :

Southside · November 24, 2023, 11:08pm

Restarted my cloud instance with an initial single node.

A lot of errors similar to

ERROR sn_node::node] Failed to dial /ip4/68.183.33.184/tcp/46521/p2p/12D3KooWA9VCF5s8SiaDjeSRST7D9yT6nr5aJpseb1xowWPRy2B5: DialError(DialPeerCon
ditionFalse(NotDialing))

also

ERROR sn_networking::event] OutgoingConnectionError to PeerId("12D3KooWDdbpuGWetCZnG1EPz1YspTnjkX21hE35XXJL1ZrKbPDr") on ConnectionId(1) - Transport([("/ip4/138.68.159.23/tcp/45669/p2p/12D3KooWDdbpuGWetCZnG1EPz1YspTnjkX21hE35XXJL1ZrKbPDr", Other(Custom { kind: Other, error: Custom { kind: Other, error: Left(Left(Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })) } }))])

and


TRACE sn_node::replication] Not having enough peers to start replication: 1/20

stout77 · November 24, 2023, 11:15pm

Memory usage has gone up a bit:

PID: 145576
Memory used: 685.562MB

So we have most likely lost nodes

loziniak · November 24, 2023, 11:16pm

Same here. No connection.

loziniak · November 24, 2023, 11:17pm

Btw has anyone tried Registers stuff? All functioning?

peca · November 24, 2023, 11:24pm

I don’t know why, but CPU usage went quite high compared to start of the network. The order of colapse I see is this:

Node (or whole machine) CPU usage jumps up and starts hitting 100%. (this causes the higher latency I think)
Memory usage starts going up (probably because work backlog). Problem is it stays up even when there is enough CPU power available again.
Nodes start dying because machine runs out of memory.

I see it only on low power machines, server with average CPU is fine so far.

How much CPU that node uses?

stout77 · November 24, 2023, 11:29pm

CPU usage: 11.6%

Topic		Replies	Views
NodeDiscoveryNet [07/07/23 Testnet] [Maidsafe nodes offline] Releases	178	3470	August 2, 2023
ReduceConnectionsNet [ 06/12/23 Testnet] [Offline] Releases	212	2746	December 13, 2023
MoreStabilityNet [19/07/23 Testnet] [Offline] Releases	96	2459	July 26, 2023
NoClientGossipNet [22/11/23 Testnet] - [Offline] Releases	43	882	November 23, 2023
ClientImprovementNet [22/09/23 Testnet] [Offline] Releases	285	3751	September 28, 2023

NoClientGossipDiscovery [24/11/23 Testnet] [Offline]

Related topics