Node size considerations

White paper says node size is designed to be about 2GB. How is it enforced, that the size really is 2GB and not smaller? If the node is responsible for example 1GB of data, how is it enforced that it reserves that extra 1GB of space for future use?

1 Like

The size is in terms of number of chunks stored.

Currently it is 2048 chunks and max of 1/2 MB per chunk giving max of 1GB
Have been told this is for the testnet and sizing will go up to give max of 2GB
the 2GB represents a max size and can be smaller if chunks are not all max size

It is enforced by the coding only allowing 2048 chunks (records)

It is further enforced by the fact all the other nodes are limited to 2048. Thus if you modded the node s/w to have 100,000 chunks max the node will still only get a max of 2048 chunks since the close node algorithm would limit it to that. It might be possible that by a quirk of timing that the node might get a couple more new chunks to store, but that can only occur on a full network and will be churned as fast as new nodes can take up the slack.

If the nodes tries other tricks to store more then its more than likely to be seen as bad and shunned by the other nodes.

7 Likes

I understand it’s capped, no problem there. But I have also been told it’s fixed to 2GB, which I understand to mean that the node cannot take less than 2GB of HD space even if it is responsible for 1GB of chunk. Am I right, and if so, how is that done?

AFAIK that is not correct. If the max chunks (records) does not result in 2GB then it won’t magically make up that space just to enforce 2GB

But you have to ensure that the 2GB is available.

The way chunks are stored at the moment is that there are active and inactive chunks. Active chunks are the ones your node is in the closest 5 nodes to them. And inactive chunks are the ones that were but are not any longer.

I suspect they keep the inactive ones around since its not stopping new chunks since a new chunk causes one of the inactive chunks to be deleted. So the advantage to keeping them is that if your node becomes the closest node to any of the inactive ones then your node will just make that one active again.

If a node leaves that is reasonably close then there maybe a number of your inactive chunks become active, thus saving on bandwidth of d/l them again during the churning caused by the other node leaving.

Basically once your node has the max chunks in its store then it will always have the number of active + inactive chunks be a constant number at max chunks

8 Likes

I expect that enforcement will be part of bad node detection, based on the node being able to deliver the chunks it is expected to hold.

I’ve seen signs that this is being worked on (eg nodes asking each other for a random chunk they should hold), but I think details are tbd.

7 Likes

Then it seems to me that the pricing curve does not create incentives to add more nodes when the price of data rises, because the rise per unit of data is not enough to offset the lack of number of nodes that I can fit in my HD space. Or the incentive is there in both ends of the graph, but not in the middle.

Reasoning like this:

Now, if I’m correct, the more nodes I run, the greater chances I have to earn.

Let’s say I have 20GB HD space to use for farming.

Number of nodes in different fullness points (r):

r = 1 = 10 Nodes
r = 0.5=20 nodes
r = 0.1=100 Nodes

According to dynamic pricing equation presented in white paper, the earnings per fullness levels are like this:

price = -20/(r-1) + 10

Then:

r = 1 = infinite
r = 0.5 = 50
r = 0.1 = 32.2

My earnings with my 20 GB HD space in different fullness levels:

 r      price          nodes    earnings
 1   = infinite  x  10 nodes  = infinite
 0.5 = 50        x  20 nodes  = 100
 0.1 = 32.2      x 100 nodes  = 3220
1 Like

If you do not provide the 2GB per node then then you run the risk that all your nodes could be shunned when all your available space is used.

If say you had 100 nodes then as soon as the network is around the 10-20% full, all your nodes are shunned or crash due to no disk space available.

At 20 nodes then you’d last longer and a long shot they may survive for quite a long time.

Also your nodes have to accept chunks churned to it when they join the network. So if the network is 50% full and you add 20 nodes with 20GB of disk space then you’ll earn zero. No space for new chunks.

Your analysis is flawed because of those issues.

  • r=1 x 10 nodes your earning would be zero because they fill up with churned chunks and no room for any new chunks, and this would be the same for anyone adding a node.
  • r=0.5 x 20 nodes will earn zero because your nodes would fill up with churned chunks and no room for any new chunks
  • r=0.1 x 100 nodes. SAME

See the pattern.

You must always have spare space (upto 2GB/node) when you add your nodes in order to earn anything at all.

If you want to actively “game” this way then you’d be better to use the network fullness + 10% and have as many nodes as fit into your supplied disk space and as the margin above fullness reduces you kill off nodes to get back the 10% headroom. Maybe at each 5% fullness mark.

Thus at 40% full you would run say 20 nodes, then when fullness gets to 45% you kill off two nodes so you have room for 55% and keep doing this at each 5% so that you have 10% headroom at that point.

But in a more practical sense disk space will not be an issue for home users, CPU/memory and internet bandwidth will limit the number of nodes they can run. Same for VPS since data quantity to/from the internet will be the major cost factor.

5 Likes

Yeah, I wasn’t that detailed, but otherwise that’s exactly what I was thinking. In my opinion that kind of gaming seems so simple, that it could be expected to become the norm.

Why would anyone use more than, say 10% - 20% headroom to begin with?

If the pricing mechanism is designed with 2GB fixed node size in mind, then that size should be enforced. Otherwise this is an evident source of behaviour that may be bad for the network in the long run.

2 Likes

Most not technical people will do a set and forget method of running nodes.

The active management system suggested has the draw back of causing extra churning for others and if implemented network wide would see ripples of massive movements of data across the world as the network filled up that extra percent, or whatever target points used. Maybe each chunk extra it is checked and when nodes are killed off across the world as that extra chunk of fullness occurs tthat tips the balance.

Actually its worse than just massive data movements at the same time. There will be occasions that the mere fact of everyone in the world killing off one or few nodes at the same time causes a cascading effect of the network filling up due to all nodes going full over say a day as each tipping point kills off more nodes and total network capacity reduces each time.

Basically your active management will only be effective if a relatively few people do it.

4 Likes

Yes, and otherwise it is a disaster, and that’s why I am talking about it. It might start with a few and then spread. And that’s because the economical incentives work the wrong way around. But if the node size would be fixed and enforced to 2GB, problem solved.

EDIT: The “leftover” space could be filled with exrta replicas, thus making the network more resilient. So the replication count would not be 5, but at least 5, and often larger. I think there has been discussion about this a few years back, when proof of resources was a thing.

3 Likes

Lets put some figures to that scenario where every machine uses active management of nodes.

Lets say we start looking when the network is 40% full and there is 20 million nodes.

  • 40% full 20 million nodes ==> 16,348 TB stored
  • 45% full 20 million nodes ==> 18,432 TB stored
  • pruning occurs with 2 nodes per 20 nodes
    • 18 million nodes with 18,482 TB ==> 1024 chunks per node (50% full)
    • new pruning triggered immediately after ripples have completed. Again need 2 nodes kill to keep ~10% headroom
    • now 16 million nodes with 18,482 TB ==> 1155 chunks per node (56% full)

and this will continue till all nodes full.

3 Likes

There is no way to enforce 2GB. Simply because its easy to Mod the code and any “proofs” can be fooled with temp storage.

The 2048 chunk will be effectively enforce it after a period of time with active and inactive chunks. Neighbours checking seem to be getting introduced as part of bad node detection. But even if not it is still an effective enforcing. The inactive chunks will be occurring each churn event close to a node. Like nodes being added.

In any case enforcing the 2GB isn’t reasonably feasible.

Also I do not thing active management will take off since the churning happening on the person’s own PC (or VPS) will cause issues with their bandwidth and/or quotas.

And as I say the incentive isn’t there to do this active management since disk space is not the issue. And active management only solves a lack of disk space. In general people will have as spare resources too much disk space for their nodes. The CPU usage or memory usage or bandwidth or quotas will limit the number of nodes they can run before disk space will.

So active management is a solution to a problem that will only occur in the more rare of cases where the person has filled their disk so much that they have GBs free for nodes, but have a high end computer and as well high bandwidth/quota internet.

5 Likes

Just to add to neos point, this would only work at the very start of the network. Once there’s data on the network, you’re not getting paid for “catch up” to the fullness of the rest of the nodes. So even leaving yourself 10% headroom you’re limiting your earning potential.

Thats aside from the fact that optimizing for 20gb of space is trivial. As you start scaling to 1tb of space at 512 nodes, space is absolutely not your concern. You have a beefy rig where cpu and mem management will far outweigh space concerns.

1 Like

If that’s the case, I’m happy.

So adding storage space for the network is not based on existing node operators adding more HD capacity, but getting more people//homes//companies//“endpoints” running the capacity they can with their bandwidth/CPU/memory.

I still think it’s not good to have the economic incentives aligned with bad behaviour, but if the significance of that behaviour is limited in the way you describe, I’m fine with it. And it does incentivice good behaviour too - more “endpoints” joining.

1 Like

What effects will the 2GB per node have on someone running a entry level hardware like a Raspberry pi 5 and a 1TB SSD?

Will a person running a pi 5 be able to fill a normal sized SSD like a 1TB? Will a pi 5 be able to run 250-500 nodes?

My pi5 finally came in. I don’t have an external to hook up, but have a large nas I can mount. Disk will be slower, but should work for a rough cpu/mem limit test. Network usage will be, my guess, an extra 25% overhead after sync that way.

I’m also working on a docker setup so these can be run straight on my synology nas. I think nas is getting gogulas popular (what was that autocorrect?) enough that the “market” for an easy docker setup will be quite helpful to adoption.

Nice, I’am interested in the results when you get things going.

Also sounds nice. I have no experience in docker setup but it seems very useful.

I doubt it since people on PCs are saying their nodes can be taking 1% of CPU and for RPi this would be higher. I’d say for set and forget operations 50 would be an absolute max, but could see CPU/memory bottlenecks resulting in some or all nodes being seen as bad.

Just to add another point, active management will cause higher bandwidth & quota usage than a set and forget situation. When nodes are deleted it causes churning occurring on neighbours to those shut down nodes. This will happen to your rig when neighbours doing the same shutdown some of their nodes. Also when the network grows and fullness reduces your rig starts up more nodes causing more bandwidth & quota usage compared to the set and forget rig.

If you wanted to do your own inconclusive straw poll, ask your friends and neighbours and co-workers how much free disk space they have on their home computers. That’ll give you an idea of available space some or most people will have available to share some of it with the network. You’d have to factor in the garbage they haven’t deleted yet as well as anyone wanting to run nodes would clean up if they want more nodes to run.

I’d say they need a minimum free for normal operations then the rest is potentially available.

NAS speeds should be fine since there is no longer any speed races to access data. Network traffic on your local network shouldn’t be an issue I would think.

And if one has a NAS already then why not put it to work. But I wouldn’t be buying a NAS with the justification of running autonomi. :wink:

2 Likes

That is the feeling I also get after reading comments, if it is a test network thing then there is no problem. I hope people will, towards the final network, be able to utilize common hardware and not wasting resources, that would be bad for many reasons.

Point 1 agreed. I was more speaking as far as getting metrics on what the pi could handle when using the Nas as the storage device. It will cause more overhead ret bring the number or nodes required to max it out slightly down.

Point 2 also agreed. I just know that they’re popular. I have one, many of my friends have them. Even some of my non technical friends. Of I can tell them to get the docker management app for their Nas and just run it, that’s much more appealing to them than “hey, here’s the black box text commands to type in”. Just another portable and accessible way to get the software on another type of device.

2 Likes