The current likely large scale underprovision of space is happening because people are incentivised to do that due to the current rewards program, and moreso since the changes to pricing means payments to nodes are higher even with empty nodes.
Assuming the recent dynamic pricing changes are temporary & won’t apply after launch, pricing of data uploads will depend on node fullness. Given this, there will be little incentive to add new nodes until nodes start filling up.
While nodes are low in fullness, data will be very cheap to upload. So it seems incentives will lead to nodes filling up fairly rapidly until theres a balance between supply and demand.
So, I wouldn’t worry about what massive scale under-provisioning of storage would do to a live network, because it’s not going to happen. It’s just a quirk of the current incentive structure in the pre-launch network (assuming after launch pricing is again linked to node fullness, and it’s easy & desirable for people to upload data to the network).
Hmm - okay! Somehow I didn’t expect this… Maybe my world is skewed then… My work laptop has a 2tb ssd (half of which is unused) pcs at home have smaller capacity but picture is somewhat similar…
…maybe we’re simply at a heterogeneous landscape again with a variety of different actors interacting different circumstances…
I read your links, and I think @neo is right in much of what he says.
I personally think that shunning the nodes when not delivering the chunks they are responsible for, is “too little and too late” a measure.
What I’d rather see is possibly something along the lines @mav suggested years ago. Filling the nodes to the capacity with something that could be asked for in regular intervals. And if not returned, nodes are shunned: Proof Of Storage prototype?
I don’t get it. How, in any scenario, it would be more profitable to run rather less than more nodes in a given HD space?
To be fair, my work includes photography and large scale 3D modeling. And I’d better have had a new laptop a couple years ago.
Maybe it was a mistake on my part to start this discussion with asking how the people here now do things. What I’m worried about is the future, when this community (hopefully) represents only a fraction of people running nodes.
I was also thinking that the December collapse was due to disk space running out, but it was the CPU.
Though I do think that it works as an example of what can happen. And even though the network survived, I don’t think all the data did.
And just to add context - cpu is below 10% usage and each of them has a 1gbit up/down connection…
Those machines could keep most of the current network alive single-handedly…
(and I and others may have more of them where the picture is pretty comparable…)
Good question… I guess it won’t be more profitable for any single operator to run less rather than more nodes.
Operators might well be more profitable running 2 x 50% full nodes vs running 20x 5% full nodes if the pricing curve makes that the case, but they won’t be coordinating and at any point the incentive for an individual operator will be to maximise nodes on their space, even if this keep storage prices lower for longer.
But, income from many empty nodes will still be negligible due to low chunk storage price, so operators won’t be incentivised to buy more capacity, e.g. new hard drives / VPS’ until nodes start filling and return per TB of resources allocated starts to rise.
Once it costs operators to add capacity, and demand soaks up the very cheap capacity, nodes will start growing and some operators will need to kill nodes as this happens if they under-provisioned.
However, any significant subsidy / emissions will exacerbate this by incentivising under-provisioning, but likely not for long.
I like this idea too. Kind of like ‘ballast’ that gets dropped as the network wants to go higher.
Though, is it really a problem for the network if over time operators drop node numbers offered but maintain or grow actual capacity offered?
You know I know that. I’m assuming demand for storage that makes those incentives irrelevant pretty quickly, but of course that depends on network usability / performance and app ecosystem.
In addition they are also doing chunk checks if I read correctly the post by Shu. So there will be some proactive checking. And I agree that Mav had a good suggestion that should be considered, maybe as a later update when the network gets later.
CPU/Memory requirements. Quality of internet with potato routers. There are a couple of reasons.
Not everyone will be totally motivated by profit.
Some will be running nodes for a long time to start with to see what this is all about.
The less technical are more likely to follow guidelines on the number they can run, also they will be concerned that if they run too many nodes then it might affect their normal work. The set and forget kind of people (most of the world when adoption is worldwide) who will not want to micromanage nodes will simply (KISS) follow directions and the launcher says you have this much free space available (ie amount free below 80%) and can run a maximum of x nodes. Also I’d expect the launcher will also be suggesting leaving more free space than the 20%, that should always be free, for normal operations.
Currently we are in the situation where adoption is upside down and 80% of node runners are somewhat technically minded. By the time native token is introduced we hope that its less than 20% being the more technical minded and there have been added more than 500 or 1000 times the people running nodes.
Under Mav’s suggestion would churn still relocate chunks on a revolving basis, thereby maintaining and refreshing the earning power of the nodes? If not there would be too great a chance of having a “dried up well” for the node runner. Also, wouldn’t the node runners have to assure that there is always extra drive space available to accommodate the swapping of data during any such relocation events?
Reserving storage also removes utility of the host device.
Telling someone 60% of their laptop storage needs to be used up immediately, is a harder sell than it may be used up eventually. It would also take longer to bootstrap, as the disk would need preparing.
Indeed, allowing the host to use their storage for other things is surely in the spirit of just using spare capacity.
Most operating systems have alerts for when storage is running low too. Suggesting some ant nodes are stopped and their storage deleted seems reasonable to me.
Tbh, this all feels like premature worry. With genuine rewards on a live network, many of these issues will melt away. No point adding complexity and putting off users until we can see if is a problem on a production network.
Well, it wouldn’t need to be full 35GB per node reserved. I think it could be set to, say 5GB, or maybe even to a percentage calculated on top of the chunks the node is responsible for. Like, “You need to store these chunks, and have 15% of extra space at any given time.”
I think some margin is needed. How much, I don’t know.
If they run out of space, they will get shunned. So, there is already a mechanism in place to punish/remove these bad nodes.
On a live network, we don’t have subsidised rewards (at least not the big ones we have now). It is these rewards that are skewing folks to run more nodes than would otherwise be economical.
Why would someone burn more CPU cycles and RAM, never mind the storage, when the reward doesn’t make it worthwhile?
The whole point of dymanic pricing is to keep the network at a desired capacity - full enough, with a bit of room to expand quickly if needed.
Edit: To underline, the point is to retain a stable network. If lots of folks are getting shunned to maintain that, the network is doing its job.
If folks are nimble enough and there is incentive enough to run a few more nodes than they have capacity for, then we still have a stable network. In fact, we just have more CPU and RAM chasing scraps. Not a bad place to be, network health wise.
And in that instant their data is replicated to other nodes. If those nodes don’t have margin, they become unresponsive, and their data is replicated to other nodes, etc.
I think those rewards only change how many nodes it is profitable to run per paid dollar. But I don’t understand, why it would have any effect on how many nodes it makes sense to run per GB of disk space?
Let’s say someone have X amount of disk space available. Isn’t it always more profitable to run Y nodes on it, instead of 0,5xY? Or, using very concrete numbers, I can now run maximum of 6 nodes. What is the scenario when it would be more profitable to run say 4 nodes instead of 6?
I think dynamic pricing is good for slow changes, but it cannot save the data, when, say vastly underprovisioned 1% of nodes make suddenly a cascading effect, that lead 10% of a bit less underprovisioned nodes go offline.
But they also provide more CPU and RAM to the network prior to that event. Which is better/worse?
If someone has additional CPU and RAM to burn, which have no other uses, and the energy cost is lower, then sure they may attempt to max out those resources.
However, as above, is adding more CPU or RAM better or worse for the network, vs the cost of churn?
Wouldn’t that be a relatively steady state though?
Different machines will be running different degrees of CPU vs RAM vs storage. It is unlikely they will all hit the skids at the same time.
I suspect it is more likely that there would just be a constant churn of a percentage of data. It may fluctuate, but a cascade event doesn’t seem inevitable at all to me.
I think all the resources should have some margin baked in at the protocol level. Be it CPU, RAM, disk space, bandwidth… I don’t know if there are means to do that though.
That’s actually a really good point. The network is never stable, so actors going in and out with different margins, and different reasons is to be expected to be a kind of state, not as much an “individual event” as I have been thinking… Got to chew this for a while…
Well, that can be a norm, but I don’t think it invalidates the possibility on “an event”, too. Say, sabotage of undersea cable, causing sudden outage of a large portion of nodes.
But so, you are basically saying, that it is fine to leave it to individuals, how much margin they’ll have with their systems?
I think it may be tricky, maybe impossible, to do.
Given anyone can change the node code, they can rip out any checks. I can imagine forks of ant node being released with them culled.
Ofc, nodes can be spot checked, which it sounds like is being done now (to see if old chunks are still stored), but it may be hard to do more than that.
Well, I think we need to see how the live network performs. So far, even with heavily skewed rewards, the test network has been remarkably stable. This feels like a good real world test so far too.
However, if there is a problem, then I think we need a network side solution, rather than an ant node solution.
Churn only occurs when an event happens. That event is specifically a node leaving that holds chunks your node become responsible for or a new node joins that is closer to one or more chunks your node holds.
Mav’s idea has no effect on churning since it is concerning your node only.
Good point and you are right, its not something that should be done.
I made a point in another topic that once the network moves out of the tiny stage this issue will not be so great since more data would expect to see nodes to be closer to their expected average storage. Whatever that average will be long term. Might be 10GB because so many nodes are always out there, or 40GB since not so many nodes are running. And of course the number of nodes that can run is also a factor of the average size, thus as nodes leave it can cause others to reduce nodes due to their storage limits.