I suggest a simple idea to allocate the actual disk space so that script kiddies and over zealous people from being able to cause an issue. For the more rare person to modify the code to remove this, the network probably could survive and not have anywhere the effect on the rewards being spread around.
As to making sure the dummies are not spread around is so simple a first year computer student should be able to handle it. A simple flag/naming scheme for the files is all that is needed. So no danger of them even being capable of being spread around without a XOR_Name as the file name
Well I would hope the 250K nodes on one connection is not common, and yes the disk full situation will come upon them real fast, they would have to watch like a hawk cause once it gets close it could be lights out for those nodes in one hour. All shunned.
Anyway - you’re right it’s a valid suggestion - I think it doesn’t help anybody except me and a couple of others… But I guess if maidsafe thinks it’s worth to implement that “feature” they are free to so
Well many moons ago they used to have the disk space proof when nodes were joining the sections. And I don’t things the need has really changed. Some of the reasons may have but the basics of nodes being honest with how much disk is still the same.
Now we cannot have absolute proof in every case because it’d become an arms race, but for the average user and script kiddie it would ensure “accidental” over provisioning doesn’t happen. They even have it in the launchpad where you cannot run more nodes than you have disk space for.
This suggestion is for those running the nodes themselves or through safenode-manager
Its real simple and covers the majority of situations where a newbie says lets run 100 nodes on their 250GB drive including the OS. Or RPi with 64GB SD card and run 10 nodes
Over provisioned in that you claim to have more resources than you have. The term I suppose can work both ways depending on what you are trying to convey.
I hope I was clear from what i wrote that I meant over provision nodes, not resources
I see your confusion, you are thinking I am saying over provision resources. Nah, over provision nodes
Now I see where you identified the danger for a network collapse
If we have 32gb nodes after launch and the network stays empty enough for this to be possible I guess the issue is not over commitment of storage
Having done a mistake doesn’t justify repeating it in a different area… Launchpad limits to 50 nodes anyway… Whom are we protecting there? And does he need protection?
I think some largish node runner did open source his node management that helps to run huge numbers of nodes (and disadvantages himself in comparison)…hmhmm…
And didn’t the same one just run hundreds if not thousands of nodes and donated many many nanos to happybeing for his developments?
But I guess you’re right - nobody would publish such a mod
It is a suggestion to reduce the problems that only requires a simple loop in the startup code to ensure people do not “accidentally” or deliberately try to start too many nodes than their disk space can handle.
This cover probably 99+% of cases that it could happen.
Anything beyond that would just become an arms race and this is just to help reduce the issue of over zealous people. Not malicious people who would have better means to be malicious than this avenue
Launchpad limits also on disk space.
the 50 node limit is an issue with launchpad & safenode-manager becoming too unresponsive as you go above 50 nodes. That is why the 50 node limit is there
The diskspace limit is there because they wished to do exactly what my suggestion is but safenode-manage or safe node do not have those checks in them for disk space
Over provisioned the number of nodes for the amount of storage available = over committed the storage for the number of nodes = under provisioned the storage for the number of nodes.
But which one I’d use would depend on the context ie. whether talking from the application or the storage point of view and which came first.
eg.
the storage was provisioned to be 350GB before the nodes were started so if I’ve started 20 nodes that would require 700GB so I’ve over provisioned the number of nodes. If the nodes existed beforehand and then I provisioned the storage I’d say I under provisioned the storage. If 10 nodes were running on the 350GB storage and then I started another 10 that ultimately won’t fit I’d say I’d over committed the storage.
But anyway, I think something has to be done. I say that reluctantly because it doesn’t allow people to optimise costs and increase storage as they need to if they know what they are doing. Also, it smacks of too much control.
But a lot of people will either not realise the trouble they can get into or make mistakes or forget or cut it a bit fine or try to game the system.
I know the network is designed to be able to cope with churn of nodes but we also know it can be bad for it nevertheless and we should look to limit the occurrence of things that are bad for it to keep the reslience for the things we can do nothing about such as whole ISPs being affected or even countries being disconnected.
Imagine if the network is rocking along quite nicely at 50% capacity (or whatever the cost algorithm is designed to encourage) and 10% of nodes are affected by a big outage of some kind. That should be fine.
But imagine if 25% of nodes have been provisioned with only enough space for 55% of capacity because they can get away with it because it always tends to hover around 50% capacity. So 1 in 4 of the remaining nodes don’t take any more writes. Now the network has to redistribute a quarter of the data that was stored as well as the 10% as those nodes get shunned. A cascading fail beckons…
Some of the users spot their nodes aren’t working and take their nodes offline to redo their setup. That makes things worse because at least they were presumably able to serve data to be redistributed even if not take any more. Their chunks now have to be redistributed to the remaining nodes from others that held them. The problem deepens… Data loss is not far away even if it’s not happening already.
Other people who are cleverer with numbers and know they system better should have a look at some scenarios.
So I kind of like the way that the Launchpad queries the filesystem to see if there is enough space. However, that doesn’t rule out the use of thin provisioning on systems that provide this. That’s creating filesystems with a greater size than is actually available. eg. 10 x 35TB filesystems on a disk pool with only 50TB storage. It works great until it doesn’t. It can be done but you have to have keep a very close eye on it and know what the ‘runway’ of demand and capacity expansion is.
I prefer the idea of having dummy chunks. I also hate it though and of course there would have to be a way of stopping people working out which are the ones that need to be kept and deleting them or they’d have to be periodically checked like all chunks which increases the load on the network.
I really like the idea of the extra chunks being extra copies to increase the resilience. Then there is a point to this usage of capacity that isn’t actively being used for data that needs to be kept. Maybe even make that section of the ‘extra’ network capacity geographically aware on an opt in basis?
To be clear these are my thoughts as well. But since the dev team went with “average” expected size of nodes over actual size of nodes I weighted up the pros/cons and decided it was something worth suggesting since it helps those who may get themselves and the network into situations they would rather not be in.
In the end everything is a control of some sort. The way the nodes communicate control who our nodes can even talk to for instance. Yea I know nit picking, but trying to illustrate that sometimes what we see as control is not the whole story. Nodes have to be healthy and we’d prefer node operators are honest LOL, and this helps get closer to both of those goals. There are ways around this if you know, without code changes, but it means fiddling.
In some respects even the expected average is only valid assuming all records on you node average out to 1/2 the max size. May not be valid when large files become a significant part of the network. So even using expected average allows a lot of wiggle room.
I don’t think that is needed. If someone is capable of knowing and doing that then one would hopefully assume they know how to manage their disk space running out due to too many nodes. IE a script that runs checking free disk space and kills off some nodes in an orderly manner long before there is any issue.
Otherwise we get the arms race issue again
The idea isn’t to prevent knowledgable people who know how to keep things healthy because in the end they’d defeat the protection in some way anyhow. But to help those in the world running up nodes using safenode-manager. And if its incorporated then it also protects those using launchpad. So in the end it protects 99.99% of the population out there who decide to run nodes. And the 0.01% should know how to handle the issues that can arise. I’d say most in the beta test (at least wave 1) are in that 0.01%.
I really am thinking of the over zealous beta testers who “do not realise” the consequences and looking to the future where 100’s of millions of people wanting to maximise their nodes on their spare resources will be caught out and as you reasoned potentially cause issues.
Once completely shunned, no one basically is talking to them, including clients, since there ends up being no routing to them.
Would have been nice if launchpad could have use the spare space on the path since mounts can have extra space. EG I mount 3 2TB drives in /home/myuser/node/drv1 (& drv2 & drv3)
That is a good idea. Takes time though, especially if bandwidth is an issue (at startup). Like mobile devices.
I would modify this a little with
at startup dummies are used.
As the routing table is built a list of chunks is made to slowly replace the dummies with
after the churning has finished the dummies are replaced slowly over time
Skimmed through, so don’t know if this is solved already, but how about this situation:
Nodes started and provisioned correctly, but then you fill your disk with other stuff and suddenly nodes don’t have the space they were supposed to have anymore.
Here is an internal slack message I wrote last week, hope it helps.
Ghost nodes, or gaming the system
I heard about this earlier and have given it thought. I wanted to put down in the public channel some thoughts on this.
So a node starts it’s life
Gathers chunks to store (or gets shunned)
Gets paid for storing a chunk randomly
So the node does work before it’s paid. Otherwise it’s killed off.If we take this as the case then we are paying nodes in arrears and not on store, if you like. They are paid to behave as good nodes. The payment on store gives us a mechanism to reward a node over time (over time).
As the amount of data grows in a node the payment represents a smaller part of its work as it’s supplying more data to users. If we allow the node to grow massive the work done will be massive in relation to payments. So say a 5Gb node it give out 100 chunks then is paid to store 1. If the node size increases to 10Gb then maybe it gives out 200 chunks before getting paid and so on.
The point I am making is we should not see payments to store as a real time payment. It’s just a breakpoint to get paid. The amount of work done to get there depends on the amount of data the node holds.
So the amount a node holds relates to the work done for payment. The larger the node the more work it has to do to get paid. Also this relates to how it can game the system.
If nodes are 35Gb and the network keeps this approx 50% full then nodes have to get 17Gb of data and start supplying that straight away. After a while they will store a chunk and get paid. Then they supply again for a while and then get paid etc.This wheel keeps turning.
So the point I am making is nodes are not able to just jump on, store stuff, get paid and leave, they are in fact required to work and the payments should be infrequent enough to ensure payment is in arrears for good work done.I hope that makes more sense and helps frame the issue??