How fast how large (deterministic sized nodes)

stout77 · December 28, 2022, 12:53pm

I have a dumb question about gauging the storage level:
How hard is to have a counter of chunks sent for storage to a XOR address?
Or even a counter of chunks stored by the section (and divide that by number of nodes to get avarage chunks per node)?
I get chunks size vary, but 1mb max, therefore that would be a good enough min. storage indicator

dirvine · December 28, 2022, 2:36pm

Issues are

Who would keep the list
Who would agree and how to sync that

Then there is a more fundamental issue of centralising such lists of data. It’s a bit of a barrier IMO if we end up with trying to synchronise all that data (even only the names/counts) and leads to a lot of edge cases and consensus and more.

happybeing · December 28, 2022, 3:22pm

Thought experiment:

each elder keeps a count of how many chunks have been stored
the section count is based on an aggregate with any large outliers (Byzantine liar) ignored in the count. This limits the degree of distortion of the count by one or two lying Elders
periodically, on section split?, a section count is generated and each Elder resets its count to zero

So at any time the section total is initial total plus the aggregate additional number of chunks based on the individual Elders’ counters.

dirvine · December 28, 2022, 3:43pm

Cool Please note I am going to discuss this here as we do in house, so never personal, never a need to win, always a need to share the thought process with what we know so far.

Then do we trust his count? (this is the nuance that causes grief IMO)

This is a way to try and combat that lack of trust.

This is a consensus step for sure, trying to agree a number. It’s a hard consensus issue, too, as it’s not a binary yes/no or even less than that a “do you agree” vote.

These are the areas dragons lie.

If we wished to do a count there are ways, I dislike them all.

Here’s a mechanism that sounds like it would work (I bet there are a million problems coding it )

Each elder holds a *set of data names
each data name has section sig, so it’s valid
on some trigger each elder sends his set to every other elder
as these are sets of valid values we can easily merge them

Assuming no changes in network data at this point they will all tend towards the same count (number in set).

There will be changes in network data, though and so we have a race condition. So how do we get around that?

agree on some stop point where we consider the data count
- Then how do we agree on the stop point?
- We could use the change in the section key and the last section key before the trigger is the last one where we consider data valid

And so on. Agreeing on a fixed value is probably impossible in a decentralised network (or any network with a rapidly changing state). (btw in the case above we can probably agree on a historic value as you can see) For centralised services, it seems better, but in reality, they still give you a figure from thin air and milliseconds later a different one (so your pal sees differently).

So on downwards we go

When I am thinking about this stuff, I always take the position there is no correct answer and none everyone sees at the same time. We can totally order stuff to seem to make it so, but in my mind that is insane (although it is state of art).

Instead, I think about what’s good enough, and how much error can we have. Just as 2 folk never see the same rainbow, 2 humans never have the same knowledge / data set at the same time. So what about information progress? How do we build collective intelligence and keep working? This is where I think eventual consistency is close but not good enough. Consistency is a fallacy in many ways and also systems like total order who consider themselves immediately consistent are purely living a fiction. i.e. consider changes in data every second, but the consensus algorithm takes 2 seconds to terminate? You have just agreed on a lie you can prove to be the truth mathematically, but in reality, it’s a lie as the data is now not what the system is telling you.

Sorry, that’s deep, but it’s Xmas

So my feels are what’s good enough, what answer represents a close to reality, close to my reality meaning I know it’s not your reality, but close enough you can agree one.

Then I look at these things. BTW set merge is a CRDT thing and it’s very powerful for aggregating knowledge, maybe not for stating this is the state of the whole network or even of all elders at any time, but it’s a valid state. Others will have a similar but not exactly the same state.

In fact CRDT consensus and all that rely on this thing :
If every message is eventually delivered and there are no further changes to state then all honest nodes will terminate with the same value.

I say, that’s very interesting, so we stop the network and all is well Ok perhaps I don’t

I hope this shows more of the world I have in my head and I hope it helps. I am not trying to put any idea down, in fact the opposite. I love ideas, I love non personal / non ego debate and I feel we all need that in our lives. Anyway I do hope this helps show where even the most simple seeming measurement is a real problem

joshuef · December 28, 2022, 3:57pm

I’d say all very on topic. A very important consideration for the node counts, and so storage size.

So perhaps 10% (if we were to tie it to prefix len) is too tame. maybe 50% or double makes sense.

Very good to get some numbers to help flesh out the thinking though, thanks @neo!

dirvine · December 28, 2022, 4:03pm

After reading this again @happybeing, I suppose I could have summarised it by saying.

We can get very close or even exact looking at historic values. So when trying to agree on something, don’t try to agree on a current value. Try with a historical value and one that’s so old that it’s unlikely it will produce any more data.

Maybe that helps with more ideas?

happybeing · December 28, 2022, 4:41pm

No ego, just an experiment

I’m not sure your points rule the idea out? The detail varies but the ideas seem essentially the same or similar.

So to clarify, unstated assumptions:

we don’t trust all elders
each elder collects the counts of all elders, calculates an aggregate, excludes any counts which are outliers > ±20%? and presents a result which is its aggregate (adjusted to compensate for any excluded counts). Maybe it also reports the set of elders/counts included/excluded if that’s useful in checking for dysfunction.
the concensus result requires 5 of 7 Elders (BFT) to have presented the same set of acceptable elder/count sets and the same aggregate (where the agreed set will also include at least five of the seven).
the result doesn’t have to be 100% accurate wrt chunks stored, just good enough as you say. Good enough to decide if a threshold has been reached.

I think one point you make is that each elder may end up with a different count value for a particular elder? This makes it unlikely they will agree on the same aggregate value, and perhaps even the set of elders to include in the calculation.

That may not matter because:

we’re actually trying to agree on whether a threshold has been reached (eg is it time to split?) so really all we need is for each elder to calculate an aggregate and decide whether to vote yes or no to that question.

If they’re just under that threshold because one elder is a bit behind, the threshold and split will just take place a little later. No Biggie.

for monitoring and diagnostics these counts might be useful even if each elder has a slightly different picture at any point in time IDK.

Anyway, maybe it’s an option to use in estimating storage or section. Just a thought.

dirvine · December 28, 2022, 4:53pm

Do you mean as long as each of them are above a certain value?

True and this gets around another issue with this kind of voting.

That is, a vote without a termination can be held and replayed alter to cause havoc. i.e. we vote for something being true or false. Later that vote can be replayed if it did not get consensus at that time.

In this case though it won’t matter as you say as when and elder sees the limit reached then in a network that never removes data it’s unlikely that limit will reduce again (unless more nodes are added reducing the load on each node).

More food for thought

Shu · December 28, 2022, 5:39pm

For some folks, like myself, I was planning on having a setup that allows live storage space expansion without restarting the sn_node pid. The mount point provided to sn_node pid is a clustered file system across N hosts in this case, allowing one to easily scale horizontally and vertically of the underlying storage mediums across multiple hosts.

Granted, this is a not typical setup for most folks, but my initial goal was to minimize downtime to the sn_node pid. I was valuing uptime as more critical than the raw storage space provided depending on how the rewards system (separate topic) is built out over the lifetime of a single sn_node pid or multiple sn_node pids. I rather have some redundancy in the local clustered file system that allows easy storage space expansion while maintaining uptime than deal with single node / single storage medium combination (personal preference). Offcourse, being able to support multiple sn_node pids is also do able by creating sub folders per N sn_node pids against the same mount point.

I agree here that bandwidth will likely be the limiting factor here based on the raw estimates that were provided above.

Ideally, as a future node operator, I wouldn’t want to run dozens or hundreds of storage node pids on multiple hosts or the same host against the same storage backend. Having to run multiple sn_node pids against the same storage medium increases complexity on operating the environment for any node operator (separate topic), but I think for the techies out there, they will do what is needed to be as efficient as possible given their limited bandwidth and storage at home.

I am definitely curious where the discussion on this thread leads to in terms of what that means for a optimal setup at home required to maximize node operator rewards/efficiency.

stout77 · December 28, 2022, 6:31pm

Thank you @happybeing, that was my exactly my thought

I can’t help but thinking that keeping track of storage level is necessary…

On a side (but related) note, how does the network agree on the fact that a node is full, and therefore not accepting additional chunks because of that, and not for malicious purposes? Or a node simply can’t stop accepting new chunks (which would be very limiting imo)?

dirvine · December 28, 2022, 6:58pm

This is where constant size nodes work with Elders also storing. So keeping them small and with data distribution when elders are full, so is everyone else. When the supermajority of elders agrees they are X% full they can add new nodes, which reduces the strain on the network. This allows the network to grow when it needs nodes.

So we don’t need to know the actual size of the data stored. If we want to know that, we can multiply the number of nodes by the amount of data each node stores. i.e. 5Gb storage, we are 50% full with 200 nodes in section. Data size == 200*2.5Gb in total. And on it goes.

When the section splits and new nodes are added to the new section this cycle continues.

If we want to know the whole network data size we can take num_sections * data we hold in its section.

So it all works out well if we are happy with close enough values and I think it makes sense we are.

stout77 · December 28, 2022, 7:41pm

Thanks David for reiterating this, I was rather thinking in the non-fixed size scenario.
I get the level of semplification this brings and can’t wait to test it in a churn scenario.
I suppose, in this fixed-size scenario, after a node join and/or section split, the data needs to be redistributed quite quickly and evenly, as others pointed out. Otherwise the elder judgement could be off for a while…

davidpbrown · December 29, 2022, 12:34pm

I don’t have a good understanding of the setup for nodes and elders etc but just wondering…
perhaps elders could be the ones to hold more spare capacity as an option for flexibility - taking the strain for their subordinates, without overloading themselves but there available as a parallel action to the expected next step of splitting… reduce any risk of brittle runs of new data stressing what is normal. That would also help as a metric of network strain in places if the elders were taking the measure of it. Perhaps that only as an option for never event handling.

joshuef · December 29, 2022, 12:34pm

I think this would probably still be the case. Everything here is hypothetical so far, and I think uptime will still be pretty key for rewards (though as you say, kind of another topic).

I don’t think it matters? Whats malice there? Refusing chunks or not being able to store them. For the network the outcome is the same…

(It’s real difficult to show mal-intent for almost anything. Some signed votes etc we can… but detecting actual malice vs being late / just not doing what we wanted/expected is very hard… and in the end, may not matter)

TylerAbeoJordan · December 29, 2022, 12:41pm

I’m pretty sure there’s probably an obvious reason(s) why this won’t work and so why it hasn’t been suggested, but in the off chance …

Instead of fixed node size which leads us to wonder how do we increase node size in the future and not rely on magic numbers …

Why not:

allow new nodes to report their initial size.
elders take this info and compute the ratio e.g. new node if 1GB, and total section size is 50GB, so 1/50
each node has a ratio number and then data can be sent semi-randomly, but according to the ratio.

So all nodes still fill up equally, time-wise and fullness can still trigger splits, but no magic numbers and no restrictions on node size.

Guessing there is a communication problem with this idea or it’s a question of implementation (who does it).

Anyway, interesting discussion.

Hope all are having a nice break.

davidpbrown · December 29, 2022, 1:16pm

I wonder the answer will be binding a number of negative feedback loops that allow the network to optimize itself. So, metrics that sense the trajectory atm and anticipate a stress some time in the future unless that trend stops.

If flags tempt a split that can be prepped and new data resisted to mitigate that, preferring some other area of the network with more flexibility can adopt the now. So, perhaps how many copies exist at a network level; how much stress is being experienced at a local level - if the node is 64% full when will it be 80% full and pull out all the stops if it’s over 80%, otherwise target at 64% as an ideal.

There’s also perhaps work in splitting; so, need to resist too much extra work churning, just to see a dull normal to satisfy some ocd. If there are enough copies of data evenly spread, that will be work enough without tasking the network with unnecessary extra targets.

Is there an abc guide to the basic way the network is handling data atm?.. how many copies are expected etc - or are we still in flux on a lot of it?

jlpell · December 29, 2022, 2:55pm

Imo the min storage requirement should be linked to node age rather than network age. Nodes that are very reliable can be trusted with more data over time without causing havoc due to churn. Just because the network is very old, it doesn’t mean you want to trust young nodes with a lot of data.

Adding in a relationship to network age or section prefix size can help naturally adjust to technology change over long periods of time, but should be a secondary consideration imo. For example, you could potentially make the coefficients A,B,C above simple functions of network age.

dirvine · December 29, 2022, 2:59pm

I like this idea. It is kinda linking age to disk space, but that may not be bad. Elders do need quite a good CPU etc. but as a start, I like this proposal. cc @joshuef @oetyng @bochaco

This likely has legs.

jlpell · December 29, 2022, 3:26pm

After some more brainstorming, good magic numbers for A,B, and C may become readily apparent given the power law rule for disk size manufacturing increases per decade.

Another simple relationship that may be more appealing could yield ridiculously large values rather quickly:

RequiredNodeSizeInGB = A^(B*NodalAge) + C

davidpbrown · December 29, 2022, 3:49pm

Every contribution is valuable?.. over time those with little contribution could be considered less valuable, relative to others. Those able to offer a bigger contribution could be preferred - or grouped, so the older network ages. Why the minimum with all the arbitrary formulaic that tempts? If certain size of nodes were gravitated to each other - a broad order of magnitude, then the network could overlay itself new on top of old; new associating more often with new; the older nodes and perhaps the weaker elders will age and those with better capacity rise.

read like a maximum bound rather than minimum and makes sense to ramp up trust over time.

Topic		Replies	Views
How many gigabytes available at launch? Safe-Node	11	1583	September 29, 2017
Network size -- How will we know? Features	5	1059	October 18, 2014
Proposal for network growth based on node size rather than node fullness Features	95	2687	January 4, 2022
Speed and node (serving chunk) pool size Development	4	1162	August 29, 2015
Number of Partitions in a High-Capacity HDD Safe-Node	7	1147	October 17, 2015

How fast how large (deterministic sized nodes)

Related Topics