5-4/3/2(022) Playground! [Offline once more ]

It’s still responding to my node’s join requests (by rejecting them) but puts seem to be taking a long time.

1 Like

Aha, yeh I can see some large memory nodes going on. Much moreso than earlier (up from ~150mb per node to ~700mb, even on over a gig…).

Pulling logs to have a look now :eyes:

Yup, looks like we’ve suffered the same fate as yesterday so I’m going to bring this down.


For clarity, I thin there’s a couple things going on.

One: Data republishing is currently still too instensive. I actually have a working local repro for this as of yesterday, but basically it looks like we produce way too many messges (one per chunk stored), and bog the node dowwwwnnn.

Two: Voting for dysfunctional nodes is super harsh and tied to one failed instance. (And it’s not weighted against general connectivity of the section, as data storage is).

Good thing is we have solutions for both of these in the works.

  • A new data republishing flow is allmost in there from yoges which should cut down the number of messages.
  • We’re looking to holisitically refactor how/when we send messages which should limit any buildup/barrage, and take into account any pressure reports from nodes. (Which should help stop them becoming unresponsive in the first place).

We’ve also known for a while we need to improve the liveness handling (expanding it from data to all aspects of nodes), and are starting to look into this too.


Thanks anyone who got involved here. Sorry if it seems a bit of a dead duck, but both runs were useful!

17 Likes

The tests had no network to run against. It seems when launching the testnet, Terraform destroyed an existing testnet rather than creating a new one. Which is strange, because the previous run indicated the testnet was destroyed successfully :thinking:.

10 Likes

Sorry I missed all this fun, thank you to everyone who worked to set this up and to participate. Little by little we are finding what works, what doesnt and what nearly works.

Yes I still feel there is just too much traffic that will not be necessary in a production environment. Anyone using vdash has their logging set to TRACE so thats super-chatty for starters.

4 Likes