5-4/3/2(022) Playground! [Offline once more ]

happybeing · March 5, 2022, 3:10pm

It’s still responding to my node’s join requests (by rejecting them) but puts seem to be taking a long time.

joshuef · March 5, 2022, 3:42pm

Aha, yeh I can see some large memory nodes going on. Much moreso than earlier (up from ~150mb per node to ~700mb, even on over a gig…).

Pulling logs to have a look now

Yup, looks like we’ve suffered the same fate as yesterday so I’m going to bring this down.

For clarity, I thin there’s a couple things going on.

One: Data republishing is currently still too instensive. I actually have a working local repro for this as of yesterday, but basically it looks like we produce way too many messges (one per chunk stored), and bog the node dowwwwnnn.

Two: Voting for dysfunctional nodes is super harsh and tied to one failed instance. (And it’s not weighted against general connectivity of the section, as data storage is).

Good thing is we have solutions for both of these in the works.

A new data republishing flow is allmost in there from yoges which should cut down the number of messages.
We’re looking to holisitically refactor how/when we send messages which should limit any buildup/barrage, and take into account any pressure reports from nodes. (Which should help stop them becoming unresponsive in the first place).

We’ve also known for a while we need to improve the liveness handling (expanding it from data to all aspects of nodes), and are starting to look into this too.

Thanks anyone who got involved here. Sorry if it seems a bit of a dead duck, but both runs were useful!

chriso · March 5, 2022, 4:10pm

The tests had no network to run against. It seems when launching the testnet, Terraform destroyed an existing testnet rather than creating a new one. Which is strange, because the previous run indicated the testnet was destroyed successfully .

Southside · March 5, 2022, 5:14pm

Sorry I missed all this fun, thank you to everyone who worked to set this up and to participate. Little by little we are finding what works, what doesnt and what nearly works.

Yes I still feel there is just too much traffic that will not be necessary in a production environment. Anyone using vdash has their logging set to TRACE so thats super-chatty for starters.

Topic		Replies	Views
[Offline] Playground sn v0.53.0, sn_cli v0.44.0 Updates	332	6112	January 31, 2022
Update 24 February, 2022 Updates	40	4558	March 8, 2022
Offline - Just a quicky (run 1) Community community-test	36	1182	December 19, 2021
OFFLINE Will it be a Quicky? (run 4) Community community-test	173	3244	December 23, 2021
[Offline] A pre-christmas playground present Updates	64	2798	December 21, 2021

5-4/3/2(022) Playground! [Offline once more ]

Related topics