How fast how large (deterministic sized nodes)

dirvine · December 28, 2022, 12:54am

I don’t see that happening for a very long time. The nodes will use a different port but the same IP address, so we are good for up to 35000 or so

Josh · December 28, 2022, 1:59am

How much less work for the network if 5gb of data on a single machine running 20 nodes goes offline as apposed to 5gb on a single machine running 1 node going offline.

Is that really better, why?

dirvine · December 28, 2022, 2:01am

Due to data distribution, it’s the same work in total for the network but spread across data address ranges (many adults) instead of being concentrated on a few (up to 4) adults.

Josh · December 28, 2022, 2:05am

Thank you, my head is foggy finally got the covid for xmas bit slow at the mo.

dirvine · December 28, 2022, 2:07am

ah man get well soon. Rest well

Josh · December 28, 2022, 2:07am

Over the hump, it’s all good

neo · December 28, 2022, 7:40am

OK that makes sense and suspected as much anyhow.

How about rather than increasing the node storage size over time, we try lateral thinking, and go lateral.

What I mean here is that we solved the space monitoring by having all nodes have equal sized storage, well why not let nodes segregate storage. That is when expansion happens rather than increase node storage, you add a base storage.

If a node cannot expand this way then they either

1 relocate some storage (see below for how this is determined)
2 just deletes some storage

New nodes come in with just the base storage size.

The elder is doing the same as the adults and thus can keep track of how much storage is used because they are doing the same.

In practice this becomes

have a set base size for nodes (maybe 10GB - 10000 chunks) - just like we have for chunks of 1MB (max)
new nodes come in with base size for their storage.
when expansion is signalled the node adds base size to their storage if they are able
- if node cannot increase size then it will either relocate (or destroy) the highest or lowest section of storage in XOR addresses. The amount is calculated by how many units of base storage it is currently storing (x) and relocate/destroy 1/(x+1) worth of current stored. This keeps its portion correct.
- this allows smaller nodes to continue to be operational and contributing overall.
the elders do the same process and thus know how much storage is being used.
the major change to storage as we have now would be that
- the amount of storage is in multiples of base size
- the stored chunks are segregated into pools by their XOR address. The node’s XOR space is divided into however many pools the node is currently storing.
- the actual storing is not really changed and only reporting is changed to report %age stored as ( total stored / number pools ) / base size as a %age.

My concern is how smaller nodes have the XOR space they are responsible for changed when they cannot expand.

This allows people to run multiple nodes. People running multiple nodes on the one machine will have to consider CPU & memory usage and may have to limit the number of nodes accordingly. The above idea allows people to maximise their machine resources by having as many nodes as the machine can handle and then to use up all their available storage over time (after multiple expansions).

Some areas that could be played with is allowing nodes to have as many base units in their node’s storage as they wish or as according to age or …

dirvine · December 28, 2022, 10:03am

I do not understand this part. Do you mean instead of adding nodes we ask them to increase their size?

At the moment, this need for space is our signal to require new nodes and that leads to splits and network growing.

This is the balance, I suppose, bigger nodes or more nodes. I take the point about how many nodes you can run is dependent on the ability of the machine in CPU etc.

I take it the signal is the space required signal we use to add nodes, or did you have another angle on this part?

joshuef · December 28, 2022, 10:22am

the minimum they must provide. So we might easily gauge network size.

Assuming we have a “fixed size minimum” of storage size so we can easily determine space available across a network. User’s space could be larger than that (likely would be).

So if we increase the min required space a node offers every time. How much space might we want to make that requirement?

No magic required here. Just a little good will/imagination to think about how a network might grow. Perhaps there’s issues. That’s part of why we’d discuss it.

Yeh interesting point.

I’m not sure with tiered nodes correctly identifying how much space is actually available is non trivial. We could sample over an address space of known/expected data (which has been punted before and may help us here).

But I was more wondering about growth of the network capacity over time. Does tying that in some fashion to prefix length make sense? (how might this pan out over XYZ splits? Would that be sufficient for expected use at that time?? Should we tie it to some other metric? )

joshuef · December 28, 2022, 10:25am

So what I’m wondering in OP is more that we increase node counts to increase storage. If you can always run another node (but it may not join), this is the same thing, but without having to guess/gauge/test expansions per node?

Deleting irrelevant stored is also always an option. The “expected size available” through the storage space remains the same though.

I think this is part of the Q for the OP. How fast might that happen with the 10% value? Would there be a split that happens that might make certain device classes unusable as you say? What does tweaking the % do to this? Or starting value?

neo · December 28, 2022, 10:29am

It was in response to this question in the opening post about increasing node storage size over time (every split)

Ah OK I misunderstood the direction of the question.

Again there are limits in CPU & memory as to how many nodes a SBC or simpler computer can handle. EG older laptop, SBC like a RiPi. I could have a 8TB drive on my RiPi but only run 10 nodes (educated guess) and if 50GB is the node storage size then only 500GB is possible

neo · December 28, 2022, 10:35am

Well the answer is what you said above that the question was not about increasing the minimum node storage size, but adding more nodes

joshuef · December 28, 2022, 10:39am

Yeh, adding more nodes is one option. But then, does increasing storage size help us? What difference would that make for these older devices?

dirvine · December 28, 2022, 10:45am

True, Rob and good point, however to get to as many people as possible I am not sure that would be a normal setup.

I wonder if we can consider that a setup that an IT expert may have and that expert can also change it easily.

The consumers with very old computers may have limited disk space in them and also slow disk space.

It’s all this balance, I am not sure we can cope with all types of advanced setups, but I feel confident we can get to as many old and slow machines as possible, to even run a single node.

good convo

neo · December 28, 2022, 10:48am

I did address this in my suggestion. Instead of simply increasing the size, make the storage in units. Effectively split the XOR space the node is storing for over more “pools”. IE each “pool” holds chunks for that section of XOR space. This way the the filling is always related back to the standard space (eg 50GB perhaps) and no need for smaller older machines to miss out because they only have the number of pools they can handle, effectively reducing the XOR space they are responsible for.

So when wanting each node to actually have a larger storage space then they add another “pool” if they can.

joshuef · December 28, 2022, 10:53am

Ahhh, right. I think I see what you mean now. Sorry that didn’t click for me above

So more like one ‘node program’ is many ‘nodes’ ( in many places…)?

Not changing the size/storage setup in any given section (in theory). Is that right?

joshuef · December 28, 2022, 10:58am

For further clarification…

in the OP I was wondering more towards “how much space is on avg available on a machine, and how might we reflect this in node size in the network over time…”. (Assuming that nodes can restart to upgrade the HD eg…).

And how might we deal with that deterministically in the network? prefix length being one clear stat we have.

But maybe the idea of “just add more nodes” is one answer…

And so then is the question? how many nodes would we need to store XYZ data size in total on the network (at a given time… so we’d have to assume it increased over time…). And how realistic might that node count be?

dirvine · December 28, 2022, 11:11am

This is describing what I would consider an archive node. So additional safety for the network, but is not counted as base storage. The base storage is always then the number of base nodes required to cover the whole space.

i.e. these expanded nodes do not affect node counts or base space, but they provide an additional safety (archive). Otherwise, we have the churn issue there if we try and replace an archive node in churn. (huge traffic in a small part of the section)

Is that what you are seeing?

neo · December 28, 2022, 11:49am

My assumption of your question was along the lines that each split would have the nodes being told to increase their total storage by x%. (and assuming multiple nodes per machine), But you clarified you wanted to work out a figure for the node storage size to use and just have multiple nodes on the one machine to increase the overall storage.

Of course the split is for one section, but multiple nodes on the one machine will see the nodes spread across many sections.

My idea was directed at the increasing storage per node and thats not what you asked. My idea would increase the storage per node by adding a “pool” of storage and consider the XOR space the node is responsible for spread across the multiple pools. Thus keeping to the original storage size when testing for fullness while storing a multiple of the original storage size. Anyhow its a tad more complexity than just adding nodes.

As to the size to set nodes to (your actual question )
if we take memory sticks as a type of whats doable by not so powerful machines, then we see 256 GB as a potentially affordable SD card size now (there are bigger), but I would not expect people would want to rush out to get a 256GB SD card as its still a hit to the wallet. Memory sticks follow SD sizing usually.

Now drives out there have been like 500GB in home computers for quite a while and now typically 1 or 2 TB. How much spare space, well home computers with steam games & word processing usually is using 200GB to 400+GB. Thus 100GB is all you can safely rely on for 500GB drive and maybe 500GB free for a 1TB and 1TB free for a 2TB drive

Now I would think that we would be looking at any machine running at least 2 nodes for the benefits multiple nodes give. We also must look at the bandwidth required for relocating any node as it ages and realising that many areas (like much of USA) have lowish bandwidth/quota which will push the optimal node size down.

bandwidth to upload the node storage when needed (assume 1/2 stated internet bandwidth and full is 1/2 of the storage used)

old ADSL ~20/2Mbps (125KBps for uploading) means 8 seconds per MB, 8000 seconds per GB
- any decent sized node is going to take a long time when relocating the node.
Min NBN in Australia 25/12.5Mbps (780KBps for uploading) means 1,282 seconds per GB
- Allow 4 hours for relocating a “full node” (ie 1/2 used) means 22.5GB as the node size
Mid NBN 50/20Mbps and full NBN is 100/40Mbps and high speed 400/40Mbps. So 45GB and 90GB respectfully

Taking drive/memstick sizes and bandwidth (without doing graphs which would be more accurate)

only consider disk drives it would seem that 128GB would be a max
only considering bandwidth then around 25GB would be a max if allowed on the order of 4 hours to relocate a node.

Thus I would be considering the bandwidth to be the limiting factor and like to see some allowance for slower than normal speeds. (eg during peak hours)

Ball park, back of envelope estimations put it at about 20GB as the suitable node size. 20GB is 20,000 chunks minimum. 500GB of free space would have 25 nodes running to use all the free space.

This will be requiring a lot of nodes to be storing the kind of data we expect to be storing and that maybe another long term factor we should be considering. EG the 20GB will see nodes storing upto 10GB, 100 nodes per TB, 100K nodes per PB and so on ( 4x with replication). With these figures then I cannot see how a section could be less than 1000 storage nodes. Replication being 4 times then perhaps that should be 4000 nodes per section.

[EDIT]
Estimates of 64ZB (10^21) being stored today in the world and doubling every 2 years.
1PB (10^15) requires 400K nodes
1XB (10^18) requires 400 million nodes
1ZB (10^21) requires 400,000 million nodes.

Maybe if we look at this in 3 to 5 years then internet speeds will be higher for more people and 50GB might be best.

Not quite and it would seem on reflection to be adding more complexity (even if its small) in the code for not so great benefit,

neo · December 28, 2022, 12:45pm

Following on from @tfa post

Looking at these figures 20GB nodes will be requiring a lot of nodes and sections to be storing even a fraction of this data. As an aside hopefully deduplication will reduce this figure by a lot. How many times is a single cat video copied across the internet in 1000’s of facebook accounts, on twitter, etc

If the average computer runs 10 nodes (using multiple ports) then 1PB would require 4x10^4 computers or public IP addresses and if NAT can be solves then multiple computers behind one router would reduce the number of public IP addresses needed.
1XB requires 4x10^7 computers.
1ZB requires 4x10^10 computers.

Without NAT this means 1XB is technically possible on IPv4 but practically impossible because you expect many more clients than nodes thus exhausting IPv4 addresses. NAT may make it feasible but still unlikely.

Thus to get the large storage with SAFE is aiming for we will be needing IPv6

Not really the topic here but as a associated issue it needs to be noted.

Topic		Replies	Views
32GB beta test. Did it actually test 32GB average node size? Educate me please Support	57	329	December 6, 2024
Suggestion for helping to reduce "accidential" overprovisioning by node runners Development	43	277	October 8, 2024
Node size considerations Ant-Node (was Safe-node)	56	524	April 4, 2024
Estimating early Safe Network size Marketing	20	944	September 24, 2020
How many gigabytes available at launch? Ant-Node (was Safe-node)	11	1614	September 29, 2017

How fast how large (deterministic sized nodes)

Related topics