New network going up now, testing 4MB chunks. Time to reset your nodes

Why is nobody blaming @Southside ?

8 Likes

Because you know what would happen if you didā€¦

3 Likes

2MB next time maybe :upside_down_face:

5 Likes

In JFK about to get on another sodding plane :joy:

Iā€™ll give you a wave on the way past

5 Likes

Aye, thats effing convenientā€¦ :slight_smile:

1 Like

Its real, I restarted my nodes and the number of routing table peers is way down and network size reported by each node is woefully low.

Its the kind of cascade that is possible when this huge complex control system starts hitting limits and shunning happens quickly. 20 minutes is 4 x period to shun (from memory a node can be shunned in 5 minutes)

If a machine is maxing out its upload b/w for too long the other nodes that have it in their routing table will start striking it. If the maxing out last too long than a number of those will shun the nodes on the machine maxing out its b/w. Then churning starts since that node is not considered as a close node by some other nodes. This could potentially cause a cascade effect because other machines will start maxing out their b/w and the wave continues till only the much more capable machines remain or the network starts falling over

Exactly, once major outage happen then something can be said to have gone wrong and requires investigation and acknowledgement. Even if it recovers it needs to be investigated and acknowledged that its not desired behaviour.

So sad, reminds me of 1974 driving to help a friend clean out mud from inside their house (house on stumps not on ground)

I wonā€™t repeat 8 times upload load on nodes for each 4MB requested/churned, blah, blah. Let the post analysis reveal all. Maybe in 2024 control systems theory still has its place even in s/w and network systems.

4 Likes

I wasnā€™t home yesterday and wow the network is a mess now. Most of my nodes survived, only one machine crashed due to old spinning HDD not being able to keep up with the load.

Now I am seeing weird things. For example these are 2 VPS machines, same provider, same specs, different locations. One would expect them to have same amount of records:

Also I dropped in rank from 19 to 70th place even when most of my nodes kept running.

EDIT: According to logs most of my nodes were shunned when the yesterday event happened :roll_eyes:

4 Likes

Quick update here as well for visibility. Yesterday around 19:00 CEST I reset Ā± 9000 nodes on the network. Around 6000 of those (downscaled number of nodes per device) tried to get back online between 19:00 CEST and 21:00 CEST with a 15000 interval per node (20 devices) after which I identified something wrong with my settings and reset again at 21:00 CEST.

Oopsie? :sweat_smile:

7 Likes

Which explains why many nodes are earning in the low 1 or 2 digit area and then others a couple of k per chunk

Aha! A smoking gun? Well done for analysing that and also fessing up! Not everyone in the world would have done that.

Of course it will be investigated, and itā€™s already been acknowledged. What I was responding to was jumping to extreme conclusions before the investigation or even any cursory reason has been established.

4 Likes

err, has anyone spotted these are now 64GB max nodes?

Iā€™ve only just spotted that Vdash reports 16384 records can be stored. 16384 x 4MB Records = 64GB.

I know we are not going to get to that fill level, or even halfway on this network or any time soon on the real one but thatā€™s what the number would be with these settings.

EDIT
I realise that it was going to be an ā€˜averageā€™ level and a lot of the records are actually in bytes and are presumably payment information. But even so, quoting the nodes as being 32GB max when they could potentially be asked to store 64GB is a risk.

1 Like

I would need Qi to confirm this, but I think the size is based on the average record size being half of the maximum.

1 Like

Yes, I realised it was based on an average considering records can be less and a lot of them will be a lot less so edited my post. Still seems a bit of a risk to be injected.

When we stop Iā€™ll add up the records of different sizes and look at the du of record_store compared to the number of records. If we get to a nice number like 8GB per node we can extrapolate up and if the number of records isnā€™t in the right ratioā€¦ potential trouble.

With every stored chunk there comes a tx of size 4kb so itā€™s not as risky as it sounds in the first moment - but ofc in theory a full node could end up above 32gb now (and I agree the communication is misleading - because we had ā€œ2gb nodesā€ before and after increasing chunk count and chunk size to 32x of that max size weā€™re now talking about ā€œ32gb nodesā€ ā€¦)

1 Like

This one and the last was real size of 64GB in that the previous was 131072 x 512K == 64GB and this one is 16K x 4MB == 64GB

The 32GB was the expected average size of the node and Jim said that is the way they were describing it

1 Like

Will that tx size of 4KB still exist when it goes to ERC20 since transactions will be on the L2 blockchain

3 Likes

Good question

1 Like

was transferring from UK to South America via New York have had a sleep now I am attempting to catch up on the carnage thatā€™s been going on

i was running around 7k nodes when the melt down happened

rules for nodes being started

Ram < 75%
CPU < 75%
HD < 75%
Load average 1, 2 & 15 < 0.75 * number of cores

nodes to be removed if those values got higher the 95%

thought i was running light enough to survive a bashing

1 Like