Suggestion for helping to reduce "accidential" overprovisioning by node runners

neo · October 3, 2024, 9:33am

I suggest a simple idea to allocate the actual disk space so that script kiddies and over zealous people from being able to cause an issue. For the more rare person to modify the code to remove this, the network probably could survive and not have anywhere the effect on the rewards being spread around.

As to making sure the dummies are not spread around is so simple a first year computer student should be able to handle it. A simple flag/naming scheme for the files is all that is needed. So no danger of them even being capable of being spread around without a XOR_Name as the file name

Well I would hope the 250K nodes on one connection is not common, and yes the disk full situation will come upon them real fast, they would have to watch like a hawk cause once it gets close it could be lights out for those nodes in one hour. All shunned.

riddim · October 3, 2024, 9:45am

scripts do happen to not sleep and be pretty fast

Anyway - you’re right it’s a valid suggestion - I think it doesn’t help anybody except me and a couple of others… But I guess if maidsafe thinks it’s worth to implement that “feature” they are free to so

neo · October 3, 2024, 9:51am

Well many moons ago they used to have the disk space proof when nodes were joining the sections. And I don’t things the need has really changed. Some of the reasons may have but the basics of nodes being honest with how much disk is still the same.

Now we cannot have absolute proof in every case because it’d become an arms race, but for the average user and script kiddie it would ensure “accidental” over provisioning doesn’t happen. They even have it in the launchpad where you cannot run more nodes than you have disk space for.

This suggestion is for those running the nodes themselves or through safenode-manager

Its real simple and covers the majority of situations where a newbie says lets run 100 nodes on their 250GB drive including the OS. Or RPi with 64GB SD card and run 10 nodes

Josh · October 3, 2024, 9:52am

Let’s just sort this one little thing out real quick.

You provision resources right.

So if you are short on resources (storage)
Under-provisioned.

Excess resources.
Over-provisioned.

Thats my take, why are you guys saying it backwards?

We need a independent adjudicator in this matter.

neo · October 3, 2024, 9:54am

Over provisioned in that you claim to have more resources than you have. The term I suppose can work both ways depending on what you are trying to convey.

I hope I was clear from what i wrote that I meant over provision nodes, not resources

I see your confusion, you are thinking I am saying over provision resources. Nah, over provision nodes

riddim · October 3, 2024, 10:06am

Isn’t it over committed and under provisioned?

But I guess I am exactly one of the people who mixed it up somewhere in between xD

neo · October 3, 2024, 10:09am

I chose over provision of the nodes for the available resources.

But whatever its the idea that needs to be considered.

riddim · October 3, 2024, 10:22am

Now I see where you identified the danger for a network collapse

If we have 32gb nodes after launch and the network stays empty enough for this to be possible I guess the issue is not over commitment of storage

Having done a mistake doesn’t justify repeating it in a different area… Launchpad limits to 50 nodes anyway… Whom are we protecting there? And does he need protection?

Why the limit of 50 nodes anyway?

loziniak · October 3, 2024, 10:24am

But it needs to be done only once and made public, then you even don’t have to be a script kiddie to just use it.

Edit:
Surely, nobody will make this public, because then everybody would overprovision and hacker’s advantage will disappear. I stand corrected.

neo · October 3, 2024, 10:26am

When edit is for personal profit then nope

If people want to be malicious then easier edits to do with more effects

riddim · October 3, 2024, 10:27am

I think some largish node runner did open source his node management that helps to run huge numbers of nodes (and disadvantages himself in comparison)…hmhmm…

And didn’t the same one just run hundreds if not thousands of nodes and donated many many nanos to happybeing for his developments?

But I guess you’re right - nobody would publish such a mod

neo · October 3, 2024, 10:31am

You guys are over thinking this.

IT IS NOT MEANT TO BE A FIXALL

It is a suggestion to reduce the problems that only requires a simple loop in the startup code to ensure people do not “accidentally” or deliberately try to start too many nodes than their disk space can handle.

This cover probably 99+% of cases that it could happen.

Anything beyond that would just become an arms race and this is just to help reduce the issue of over zealous people. Not malicious people who would have better means to be malicious than this avenue

Launchpad limits also on disk space.

the 50 node limit is an issue with launchpad & safenode-manager becoming too unresponsive as you go above 50 nodes. That is why the 50 node limit is there

The diskspace limit is there because they wished to do exactly what my suggestion is but safenode-manage or safe node do not have those checks in them for disk space

loziniak · October 3, 2024, 10:34am

And 95% will probably just use launchpad and have 5 nodes on their 200GB partition

neo · October 3, 2024, 10:36am

and its the 99+% of the people using safe-node manager and/or their own one line script to spin up nodes they copied from one of the topics here

storage_guy · October 3, 2024, 10:02pm

yeah, it can be confusing! I’d go for:-

Over provisioned the number of nodes for the amount of storage available = over committed the storage for the number of nodes = under provisioned the storage for the number of nodes.

But which one I’d use would depend on the context ie. whether talking from the application or the storage point of view and which came first.

eg.
the storage was provisioned to be 350GB before the nodes were started so if I’ve started 20 nodes that would require 700GB so I’ve over provisioned the number of nodes. If the nodes existed beforehand and then I provisioned the storage I’d say I under provisioned the storage. If 10 nodes were running on the 350GB storage and then I started another 10 that ultimately won’t fit I’d say I’d over committed the storage.

storage_guy · October 3, 2024, 10:37pm

But anyway, I think something has to be done. I say that reluctantly because it doesn’t allow people to optimise costs and increase storage as they need to if they know what they are doing. Also, it smacks of too much control.

But a lot of people will either not realise the trouble they can get into or make mistakes or forget or cut it a bit fine or try to game the system.

I know the network is designed to be able to cope with churn of nodes but we also know it can be bad for it nevertheless and we should look to limit the occurrence of things that are bad for it to keep the reslience for the things we can do nothing about such as whole ISPs being affected or even countries being disconnected.

Imagine if the network is rocking along quite nicely at 50% capacity (or whatever the cost algorithm is designed to encourage) and 10% of nodes are affected by a big outage of some kind. That should be fine.

But imagine if 25% of nodes have been provisioned with only enough space for 55% of capacity because they can get away with it because it always tends to hover around 50% capacity. So 1 in 4 of the remaining nodes don’t take any more writes. Now the network has to redistribute a quarter of the data that was stored as well as the 10% as those nodes get shunned. A cascading fail beckons…

Some of the users spot their nodes aren’t working and take their nodes offline to redo their setup. That makes things worse because at least they were presumably able to serve data to be redistributed even if not take any more. Their chunks now have to be redistributed to the remaining nodes from others that held them. The problem deepens… Data loss is not far away even if it’s not happening already.

Other people who are cleverer with numbers and know they system better should have a look at some scenarios.

So I kind of like the way that the Launchpad queries the filesystem to see if there is enough space. However, that doesn’t rule out the use of thin provisioning on systems that provide this. That’s creating filesystems with a greater size than is actually available. eg. 10 x 35TB filesystems on a disk pool with only 50TB storage. It works great until it doesn’t. It can be done but you have to have keep a very close eye on it and know what the ‘runway’ of demand and capacity expansion is.

I prefer the idea of having dummy chunks. I also hate it though and of course there would have to be a way of stopping people working out which are the ones that need to be kept and deleting them or they’d have to be periodically checked like all chunks which increases the load on the network.

I really like the idea of the extra chunks being extra copies to increase the resilience. Then there is a point to this usage of capacity that isn’t actively being used for data that needs to be kept. Maybe even make that section of the ‘extra’ network capacity geographically aware on an opt in basis?

neo · October 3, 2024, 11:12pm

To be clear these are my thoughts as well. But since the dev team went with “average” expected size of nodes over actual size of nodes I weighted up the pros/cons and decided it was something worth suggesting since it helps those who may get themselves and the network into situations they would rather not be in.

In the end everything is a control of some sort. The way the nodes communicate control who our nodes can even talk to for instance. Yea I know nit picking, but trying to illustrate that sometimes what we see as control is not the whole story. Nodes have to be healthy and we’d prefer node operators are honest LOL, and this helps get closer to both of those goals. There are ways around this if you know, without code changes, but it means fiddling.

In some respects even the expected average is only valid assuming all records on you node average out to 1/2 the max size. May not be valid when large files become a significant part of the network. So even using expected average allows a lot of wiggle room.

I don’t think that is needed. If someone is capable of knowing and doing that then one would hopefully assume they know how to manage their disk space running out due to too many nodes. IE a script that runs checking free disk space and kills off some nodes in an orderly manner long before there is any issue.

Otherwise we get the arms race issue again

The idea isn’t to prevent knowledgable people who know how to keep things healthy because in the end they’d defeat the protection in some way anyhow. But to help those in the world running up nodes using safenode-manager. And if its incorporated then it also protects those using launchpad. So in the end it protects 99.99% of the population out there who decide to run nodes. And the 0.01% should know how to handle the issues that can arise. I’d say most in the beta test (at least wave 1) are in that 0.01%.

I really am thinking of the over zealous beta testers who “do not realise” the consequences and looking to the future where 100’s of millions of people wanting to maximise their nodes on their spare resources will be caught out and as you reasoned potentially cause issues.

Once completely shunned, no one basically is talking to them, including clients, since there ends up being no routing to them.

Would have been nice if launchpad could have use the spare space on the path since mounts can have extra space. EG I mount 3 2TB drives in /home/myuser/node/drv1 (& drv2 & drv3)

That is a good idea. Takes time though, especially if bandwidth is an issue (at startup). Like mobile devices.

I would modify this a little with

at startup dummies are used.
As the routing table is built a list of chunks is made to slowly replace the dummies with
after the churning has finished the dummies are replaced slowly over time

loziniak · October 6, 2024, 9:37am

You can always have 1 million copies of a first 32kB chunk, until you get a second one

Toivo · October 6, 2024, 10:38am

Skimmed through, so don’t know if this is solved already, but how about this situation:

Nodes started and provisioned correctly, but then you fill your disk with other stuff and suddenly nodes don’t have the space they were supposed to have anymore.

dirvine · October 6, 2024, 10:41am

Here is an internal slack message I wrote last week, hope it helps.

Ghost nodes, or gaming the system

I heard about this earlier and have given it thought. I wanted to put down in the public channel some thoughts on this.

So a node starts it’s life

Gathers chunks to store (or gets shunned)
Gets paid for storing a chunk randomly

So the node does work before it’s paid. Otherwise it’s killed off.If we take this as the case then we are paying nodes in arrears and not on store, if you like. They are paid to behave as good nodes. The payment on store gives us a mechanism to reward a node over time (over time).

As the amount of data grows in a node the payment represents a smaller part of its work as it’s supplying more data to users. If we allow the node to grow massive the work done will be massive in relation to payments. So say a 5Gb node it give out 100 chunks then is paid to store 1. If the node size increases to 10Gb then maybe it gives out 200 chunks before getting paid and so on.

The point I am making is we should not see payments to store as a real time payment. It’s just a breakpoint to get paid. The amount of work done to get there depends on the amount of data the node holds.

So the amount a node holds relates to the work done for payment. The larger the node the more work it has to do to get paid. Also this relates to how it can game the system.

If nodes are 35Gb and the network keeps this approx 50% full then nodes have to get 17Gb of data and start supplying that straight away. After a while they will store a chunk and get paid. Then they supply again for a while and then get paid etc.This wheel keeps turning.

So the point I am making is nodes are not able to just jump on, store stuff, get paid and leave, they are in fact required to work and the payments should be infrequent enough to ensure payment is in arrears for good work done.I hope that makes more sense and helps frame the issue??

Topic		Replies	Views
32GB beta test. Did it actually test 32GB average node size? Educate me please Support	57	329	December 6, 2024
How much disk space per node are you having? Ant-Node (was Safe-node)	103	563	February 19, 2025
Update 3rd October, 2024 Updates	19	793	October 10, 2024
Update 06 January, 2022 Updates	27	3795	February 3, 2022
Node size considerations Ant-Node (was Safe-node)	56	525	April 4, 2024

Suggestion for helping to reduce "accidential" overprovisioning by node runners

Related topics