Proposal for network growth based on node size rather than node fullness

And so we destroy the simplicity of the protocol by creating a system that is far more complex and subject to more flaws, both in basic design and in operation.

No, going to fixed chunk size makes the protocol more simple and easier. The smaller the chunk the simpler for app devs to account for. Larger chunks reduce network accounting. Where’s the sweet spot for chunk size? We’ll know soon enough.

We’ll see where the devs go from here, but I still see far more problems than solutions.

Re: node joining, the sybil defense reason to make joining contingent upon the network’s need for space makes sense, but it could create some issues. Namely, it could keep the network very small until such unknown time (maybe years?) that storage on the network becomes popular. There are pros for this, but one con is that it does not allow for a huge scale (and attendant bugs) to really be investigated well ahead of such scale being needed. And that’s a serious risk.

So, in addition to the current allow join when storage is <50% or when member leaves, why not also allow for consistent network growth according to a steady schedule per which x randomly selected nodes are added from a large pool of waiting infants?

The sybil risk wouldn’t be any worse than allowing join only when capacity is needed, and the network can grow consistently rather than in spurts.

3 Likes

Yes, this and some other angles will need to be tried out in the early days.

2 Likes

Welcome back @oetyng :slight_smile:

Seems the racing issue you raised here is related to the issues raised in this OP/thread.

5 Likes

I guess you saw the other msg @Bogard, that I’m not back quite yet, but thanks anyway :slight_smile:
It took me some time to get to reading through the OP, and I wanted to do that before I answered.

A lot has changed since then.

The network previously grew like this:

And currently, it grows like this:

SystemMsg::NodeCmd(NodeCmd::RecordStorageLevel { node_id, level, .. }) => {
    let changed = self.set_storage_level(&node_id, level).await;
    if changed && level.value() == MIN_LEVEL_WHEN_FULL {
        // ..then we accept a new node in place of the full node
        *self.joins_allowed.write().await = true;
    }
    Ok(vec![])
}

Which is a great improvement.

In plain text: The big difference is that now every full node means a new node can join, where before a whole bunch could go full without a new node being allowed in.
I think that old logic can be seen like a placeholder in wait for when someone had time to think a bit more about it :slight_smile: And it’s often like that, a gray scale between placeholder logic all the way over to rock solid stuff.
But in this area, more thinking is needed still. I know already of a few things I would like to clear up there.

Now, the potential issue I found there, is related to the new logic, and basically that it hasn’t been driven through fully to the rock solid end of the gray scale. It isn’t necessarily something that would cause a problem, but it can be enough of an issue that when reading the code, it’s hard to tell if it would. (The probably obvious note: the chances that something works as you’d like when it’s not clear how it works - are slim :slight_smile: )

So, we’ll have to see about that one, what it needs. A decision on what should be the main controlling variable, and having it expressed a bit more clearly, is a likely minimum would be my guess.

4 Likes

Seems like node count and data storage should be considered independently. For example, let’s say the section loses two nodes A and B and each node had a 1TB vault size. The network would then begin accepting new nodes C, D … until more than two nodes AND more than 2TB of storage was offered in sum.

The accounting becomes rather trivial if fixed vault sizes are used.

2 Likes

So much of this! This is so very important! We need a network that can adapt to lag without booting people off. Yes, if you’re in a low bandwidth area just add more nodes to compensate. In fact if the network would pay people in rural areas (low bandwidth areas) more then they’d be more than happy to set up more nodes; and that would have the side benefit of creating more business for ISPs and more preasure to improve internet service. Also there are places with unreliable infrastructure where there might be spurts of speed and then massive lag or places with power outages. So yes this kind of velocity mechanism is so very important.

1 Like

Also when there are small to medium outages there could be hundreds of nodes go off line for hours or more.

Then what is the definition of full?

  • when the node cannot accept anymore chunks no matter what (IE zero space in node storage) OR
  • when the node reaches a (somehow) determined percentage/size filled and enough free space to help with outages and/or nodes going off line by accepting relocated chunks.

If absolutely full then outages, especially large ones, can cause issues if there is is a high %age of nodes approaching absolutely full. Remember no new nodes till they reach full.

If the 2nd then there would be no real issues for small to medium outages since there is available space for relocated chunks.

I do like the idea of new nodes being accepted when nodes reach a certain %age of absolutely full state. One major reason is that if the distribution of chunks is random we can easily (high probability) see nodes filling up at similar rates. Once the first node becomes full then a very high %age of the rest of the nodes will be close to full. That means (if) there are some devices waiting to become a node then there maybe a shortage until enough other people try again to have their device as a node. This may become an issue especially in the early network, not so much in a more mature network.

1 Like

just bear in mind that the chunks are multiply backed up.

I would propose that the network accepts as much nodes as possible to make many backup off the data on the network so if a city has a power outage there is no chance that the chunks are lost even for a few minutes.

I believe there is also caching which is a temporary saving of chunks that are “get” and are passing through other nodes, so the nodes have normal chunks saved plus the cached chunks so, in a city power outage the chunks that where only in that city may live and be replicated through the cache that other nodes possess. I would like this to be confirmed by a maidsafe dev

1 Like

The balance is we don’t want to allow mass joins as bad actors will then just cause mass churn.

In flux but will exist.

4 Likes

oh I see, didnt catch that till now! ty

1 Like

I saw it–though only after posting :innocent:

Makes sense.

Also makes sense. Thanks for detailing the thinking. Just wanted to bubble up some of the ideas in the OP and thread as this area gets further looking into.

3 Likes

once things are up and running how much churn do we anticipate ?

if a node has a hard drive full of chunks how often will the be moved off the device and different chunks written to it due to churn ?

asking as I was wondering what are the odds of burning out an SSD hard drive?

Very close to zero. To wear out an SSD, you need to rewrite full drive several times a day everyday. My guess is you may be fine even using SD card or USB flashdrive as node storage.

Network is not an omnipotent all-knowing AI. It sounds nice, but how you determine who you want to pay more? Especially without creating oportunity to game the system and gain unfair advantage.

3 Likes