+++ Beta Network is Now Live! +++

Really happy to see you around again David.

Take it easy, less of you is better than none of you. :grinning:

If you still have all your toes, you win, maybe, idk, we can compare one day! :wink:

16 Likes

I reset my nodes a couple times now. They seem to start off healthy but maybe almost half will end up with zero peers so I’m guessing I’ll have to reduce down from 10 nodes? haven’t earned any nanos since updating to the newest node version last Thursday.

1 Like

Launchpad?

There is known bugs with launchpad operations which the update today says has some fixes coming

1 Like

Please take care. :cry:

5 Likes

No safenode manager with port forwarding. Seems weird to me but like I said before I had switched power on and off a bunch and the router was connected to that breaker so that was just a hunch of mine that I was shunned.

But after reset and having high peer counts eventually go to zero makes me wonder if 10 is somehow to many nodes now? If my logs could be helpful I’d be happy to help however I can.

3 Likes

All the best! Hope you are on the mend.

Health > Fun > Work

4 Likes

I think more than a few people are desperate for more info on this.

It suggests that a single node providing high quotes has caused major network disruption?

Beside that being scary, if I remember correctly, a node quoting way higher than others should already be ignored.

11 Likes

I agree that this is not making proper sense and when i was reading it, I thought either something was not mentioned or its just a part of the problem that they have discovered so far.

I am at the moment tracking down an issue where for weeks my nodes were fine and then all of a sudden they decide to just die off. Like others have reported as well where they come back to check their nodes and 1/2 or more just died off sometime. In my case I know the computer was running all the time, just like the previous weeks. Some how the nodes are just destroying their directories then reporting they cannot find stuff LOL, of course you deleted the whole directories in your node’s directory, or the whole node directory

1 Like

I wounder what restarting constantly means like every second or every couple of minutes I’m pretty guilty for stopping and starting nodes depending on system load to keep the load where I want it.

1 Like

Im considering shutting down 300 nodes, rebooting and then starting them slowly to try not to get above 80 on the 15load avg
If I do I’ll stop 10 every 2 mins for an hour. Or is that being too gentle?
THe sooner I stop them all, the sooner I can reboot and begin starting them again

I have node killing themselves, so if they were started as services (eg via safenode-manager) they would be restarting over and over again like crazy, but I only start them as ordinary programs. 30 out of 40 nodes just die claiming out of disk, or files not found. But it is in fact them killing their own directories. I have a feeling this is somehow related.

1 Like

im trying to think of how to add tracking this to NTtracking as just realised id be quit in the dark about this

There is another bug I hope is fixed and that is false detection of port in use. Most often the metrics port and that kills any NTracking ability as the node still starts but no /metrics possible

The pr from Qi that @Southside posted on Discord seems a better explanation of what might be going on than a rogue node. Although he says in one observation, not necessarily a single situation as it appeared.

yes if metrics don’t answer the node would show up as stopped in NTracking and zero’s accross the board

This may be a node stopping and starting itself

cos I havent manually stopped/started for 24 hrs

Neil, is this a once-an-hour snapshot at 20 past the hour?
Could this problem be a lot worse and its only getting reported once an hour?
Otherwise I need to examine timestamps on 300+ sets of logs

that’s a snapshot every 15 minutes across last 24hours but id need to see the happy node indicator to know if it could just be a node not responding at the end of the query metrics loop because system load had spiked.

Description

There is observed super-high quote due to incorrectly calculated store_cost quote, as an result of not-fully populated RT, but with records accumulated from historical runs.

In one observed scenario, a node accumulated 3k records, normally quoting 10/20 nanos (with 1.1k close_range_records calculated), quoted 4million nanos (with all 3k records calculated as close_range_records) as during the restart when its RT only got 56 peers (network size estimated to be just 76 nodes; where normally shall be 230 peers in RT and network size to be over 30k).

This PR is an attempt to: have a background thread checks how many records accumulated so far, with an interval say 1 hour
once reached 60% of the max_record, trigger clean up
remove ALL records that not within the close_range that got set

@qi_ma There is one issue I see here, if you remove the chunks rather than just mark them as inactive then you lose the ability to reactivate those chunks when nodes near by disappear and this node becomes responsible for them again. This would affect good nodes too and lose ability from the network.

2 Likes

stick the above in the GitHub comments as well