DialNet (or Are You Listening?) [31/08/23 Testnet] [Offline]

joshuef · September 4, 2023, 5:44am

This should be fixed on latest main.

Indeed!

That makes sense. cc @roland a nice wee tweak?

We haven’t dug down into this yet, but will be at a point soon to be doing so I think. Good to have folks keeping an eye on this!

Probably should not be logged as an error at the node.

Hmmm, it should have shut down if it received the message you noted?

Sadly nothing around NAT is completely deterministic. There’s a confidence level involved, and if you’re lucky the node mayyy think it’s in the clear, when it is in fact not.

Toivo · September 4, 2023, 6:05am

I’m curious, how much turnover of nodes we have had? Is there any way to know if and when nodes have left the network? Especially curious, if there has been events, when large number of nodes have dropped in a short time period?

joshuef · September 4, 2023, 7:14am

At the moment it’s hard to tell.

There’s some metric crates libp2p has we’ll dive into that might tell us more.

I can say just now, maidsafe nodes seems stable. We will be looking to add more churn to upcoming testnets though.

Also: happy day!

In other news, there’s a chance I’ll be bringing this down today. The network has served it’s purpose well, so thanks so far for everyone who’s got stuck in!

Toivo · September 4, 2023, 7:24am

Cheers!

peca · September 4, 2023, 8:21am

That is not normal. I am also running nodes behind double NAT with port forwarding and I am not experiencing this. I have 110 nodes running form start of this testnet and according to my router the number of TCP connections is oscillating around 50-60k and almost all of them are in “established” state.

My guess would be some kind router issue, for example it ran out of memory and lost most of connections, and the one node had enough surviving connections not to issue the autoshutdown.

joshuef · September 4, 2023, 10:48am

That’s me bringing this down now. Thanks all for getting involved and poking this testnet! Very positive outcome for what we were checking here (and other facets too!)

Josh · September 4, 2023, 12:03pm

Seems nobody found any empty nodes?

One of my nodes mysteriously kicked the bucket 26177 30 minutes after your post about bringing this down, so I guess you killed your nodes and it had a knock-on effect but I don’t see a spike in activity before it’s sudden demise.

Great work guys, the end seems near yet the real journey is only about to begin!

aatonnomicc · September 4, 2023, 12:11pm

I enjoyed that one finally managed to get some nodes online got about 60 up in total near the end

looking forward to the next one already

neo · September 4, 2023, 12:14pm

@joshuef So has DialNet become DialOutNet aka BusyToneNet?

Sorry my days as a telecommunications engineer are showing decades later

Shu · September 4, 2023, 2:46pm

I am summarizing the timeline for now since I will take down my node soon too. Even though I started much later on this testnet, I am glad to have participated with everyone else here!

Zoomed in:

Observations:

There were 4 specific spikes for me in the last few hours. 1st and 3rd spike batch used up 75% of CPU on 4 core machine, the 2nd and 4th spike went up to 25 % each or so.
- Note: On the 1st & 3rd spike group, the kBuckets (buckets & peers) went downward, which during that downward trend, a ton of CPU was consumed… whether it was due to more libp2p messages or actively re-updating the kBucket data structures (Maidsafe team would know more here), however, the # of raw count of logged messages parsed dramatically lowered (specifically the Peer Info Sent/Received messages), during this period.
  - Note: Maybe I am not picking up on certain type of activity here, but that might be at the lowest layers, and potentially not being logged intentionally (i.e it could result in super excessive messages or slow down too).
- Note: On the 2nd and 4th spike, the kBuckets (buckets & peers) stats remained flat, but the network logged a ton of ‘GET’ requests, which were the main contributor to the overall raw # of logged messages parsed in the panel above, and likely contributed to the two 25 % CPU spike batches.
Unique Peer IDs went up to 9700+ from 2965+ within 24hrs? Wow.
- Note: I don’t think I am double counting here, but will double check.
FWIW, kBucket stats dropped to 7 buckets with 54 peers, compared to 10 buckets and 154 peers (steady state). I am not sure what the distribution here should ideally be between buckets and peers #, but seems all is well as node continues to perform well.
The ERROR level messages that were parsed and logged pertained to the SN_NETWORKING:MSG: components with respect to specifically inbound & outbound response failure messages. This seems expected due to the network capacity being reduced, and they also seem to take place right after the kBucket statistics panel bucket and peer #s start dropping.

Southside · September 4, 2023, 5:22pm

latest LTS Ubuntu I would say. 22.04 LTS will give you stability, security without surprises.

Topic		Replies	Views
NatNet [May 26 Testnet 2023] [Offline] Releases	131	3170	June 1, 2023
Update 31 August, 2023 Updates	27	2244	September 5, 2023
Update 13 July, 2023 Updates	43	2311	September 27, 2023
Update 04 May, 2023 Updates	15	1799	May 7, 2023
Update 11 May, 2023 Updates	29	1985	May 21, 2023

DialNet (or Are You Listening?) [31/08/23 Testnet] [Offline]

Related topics