Acceptable Replication

Maidsafe “improving” on libp2p to republish on any churn is probably not a good path in the short term given the history here.

How about making two short term compromises instead?

  1. Memory: nodes will need a little higher memory (e.g., 4 Gb, 8, or even 16 Gb). Rather than spend more months (before you know it years) trying to start out with a low memory requirement, better get something out with higher memory requirement then use the resources from that proof point to lower the memory requirement
  2. Churn: design rewards to minimize node churn as much as possible (e.g., reverse stake). As with the memory point, after getting a proof point with low allowance for churn, use the resources from that to improve on the churn requirement

By making these two sensible adjustments in the short term, resources can be freed to work on more pressing items like 3) rewards, 4) consensus adaptation, and 5) sybil defense by key generation. 4 and 5 are critical for launch and it isn’t clear what progress have been made on that front. Better go with good-enough versions of 1 and 2 above, and redirect resources on getting a workable implementation of 3, 4, and 5 in place. Launch. Then improve from there.

4 Likes

To be clear though, it’s not necessarily a question of hiking our mem requirements, it’s the chosen interval working, and the knock on massive messaging throughout that has to be dealt with, by all nodes too. Which bogs down things a lot and hikes CPU and has other knock-ons w/ regards to client funcitonality, nodes joining and identifying their capabilities (it really maxes out the node’s connectivity with relatively “small” nodes of a few undred mb).

I appreciate what you’re saying, to KISS etc, but we’ve an approach that’s looking very promising, that doesn’t use their basic periodic replication (and libp2p is set up to allow such things to be built, which is nice).

This approach is more targetted, and easier for us to tweak. But it’s using existing libp2p events (the same ones they use for the periodic republication). It just doesn’t involve reading/sending everything in the node. We can readily see/ measure replication too now (which we can’t do easily in the libp2p implementation), which will be a boon in testing and comnets.

That’s the plan :+1:

15 Likes

I appreciate the detailed response. Great that you have a promising approach. But there’s also this issue (Swtich to `pull` model to further reduce communication traffic during replication · Issue #294 · maidsafe/safe_network · GitHub), which suggests going further/more complex so as to re-implement libp2p::record entirely. Thus my suggestion above, i.e., move on from 1 as soon as the current version is merged (with higher mem/cpu requirements if necessary), then use the resources and focus to complete early implementations of 3, 4, and 5 (using 3 to address 2, further helping with 1).

1 Like

The neat thing we have is that libp2p backed periodic repl is there, so we have that as a backstop to turn on whenever. Also w/ a bit of staggering memory / messaging explosions are much reduced. So we have that in the bag.

But what we do need before moving onto higher level issues, is to be sure of our data layer. With low churn, libp2p is okay (with the above tweaks). But what is our churn rate, and how does it compare? We may well get into a hybrid approach here of targetted/periodic. But we’ll be testing that out first.

The thing with the libp2p impl is that it may not be enough. If we have to republish data every 5s, and suffer wild mem/msg for the network to be “stable”… well it might not be usable. And certainly not usable by the vast majority of folk on home computers. So we’re seeing what we can do to improve things here, verifying that approach and moving all data into the RecordStore impl and working off of that, which needs to happen in either approach.

This is not a massive undertaking, and we have the option you suggest in the bag. But it’s worthwhile to see if we can improve things here, and then be doubly sure of data consistency as we move to payments etc (which are not neccesarily being blocked by data-repub work).


edit: sorry missed this in the response there.

(As for the issue you link, I’m not sure we’ll need to go so deep. We’re not aiming for memory-based perfection here, but something usable. So far it’s looking like current approach will yield something in that ball park)


Off that topic, but you mention

But i’m not sure what you mean here. Can you elaborate? :bowing_man:

10 Likes

Perhaps the vast majority of home users could simply act as cache e.g. as with bittorrent where anyone who pulls some data then shares some data for a period of time. Safe definitely needs some sort of caching mech. for popular data anyway.

As long as we have a strong incentive for people to take on the larger task of storing permanent data, then we will be highly decentralized already - not the perfect ‘everyone is a storage node’ … but perfection in the end is the enemy of the good … and maybe someday that problem can/will be solved with better hardware networks.

1 Like

Of course. But the nuance is setting a criteria for good enough on the memory front so that more critical areas for network launch can be better addressed earlier. Particularly because the memory “issue” can be addressed by just defaulting to higher memory machines for now.

We should define what “home computer” means here. For the past several years, it seems to mean 1GB or 2GB of ram for Maidsafe. But the average home computer actually has about 16 GB (even laptops routinely have 32 and 64GB). So it’d be sensible to raise the memory requirements to 4GB or 8GB if needed for the time being so that more attention can be paid to more pressing areas of the network that the network can’t operate without. Same consideration re:CPU and bandwidth.

Put differently, rather have visibility on unknown components rather than perfect a component for which there’s already a practical solution, i.e.:

  • Memory, etc. → go with current good enough and if necessary, require higher spec machine for now
  • Payment → there could be potential unknowns so get started asap so that any unforeseen issue can be addressed earlier
  • Sybil resistance → there could be potential unknowns so get started asap so that any unforeseen issue can be addressed earlier
  • Consensus → there could be potential unknowns so get started asap so that any unforeseen issue can be addressed earlier

Re: “close group consensus” and adapting algorand or avalanche consensus to that end.

6 Likes

Again, don’t disagree. We’re not really blocked at the moment due to replication flows mem requirements. We have another candidate from @qi_ma on the replication front (related to the issue you linked), but the main question I think is how to measure good replication (why would we go with the proposed impl over our current?).

With what we have in place, memory is not a concern. Even when increasing distance of when we churn (at least thus far). So I’m pretty happy with that. We’re finalising getting DBCs into RecordStore impl (and with that some further consensus Qs are being answered). And then Registers will follow. That should our replication story. And thus far with internal testing it’s looking good and healthy mem wise, such that even a 10x increase is welllll within acceptable bounds as you lay out. (Though if we can avoid reaching such heights with low-hanging fruit, all the better).


Next step is to be more sure of all the replication, (and before we can verify “consensus” we need these data flows to be rock solid) is creating/improving benchmarking around replication. Acceptance criteria.

I’d be curious if you have any input on what you think work there, @bogard?

I’ve asked ChatGPT, and from previous kad literature (where the 2k nodes == stabliity numbers come from) churn of the network over the day was ~5%. Now this was with folk altruistically running dedicated nodes… Probably a higher barrier to entry than we want for general use. So I’d expect our churn rate to be higher… how high. I’m not wholly sure.

And given a churn rate, how fast should we expect replication to occur, vs how many msgs… (vs max chunk size…) vs mem and CPU on nodes…

(cc @neo I feel like you’d have some good input on these sort of questions).

(cc @moderators I feel like perhaps this is getting off-topic, could we perhaps move some of this to an “acceptable data replication thread?” or similar?)

5 Likes

When churn occurs now is it other nodes are given chunk addresses to copy from the remaining nodes?

If so then either giving them the list of nodes with it or they request them like a client. The issue would be if 2 nodes are replicating a particular chunk then its best they retrieve the chunk from different nodes to save bottlenecks from the node with the chunk.

Also when a node gets a chunk for storage it checks the chunk is valid by the hash and in theory can look after errors itself.

For memory, obviously aiming for minimum usage is best but not at the expense of performance. Its a trade off and we have to remember if you are running multiple nodes on a PC then memory could end up being the limiting factor.

But @Bogard is right that new PC’s and laptops have much more memory, and that has to be weighted up with increased memory requirements of the other programs the user is running on the PC at times. For instance if the user games a lot then they may not be able to run more than a couple of nodes before CPU and/or memory is an issue.

Of course we cannot cater for every situation but being memory wise and CPU usage wise is very important for protocols. And now 200MB is a small amount of memory, no longer 15% of available memory but more like 3% or 1.5% of available memory.

9 Likes

Right now, when churn occurs, we detect that our close_peers have changed and send out a msg to new node if we’re within REPLICATION RANGE. (so closest 5 nodes eg).

Pull model is another proposal, and one that we’d benefit from some benchmark/criteria for assessing (and in general going forwards I think).

Yup, one of the proposed benefits it’s tighter, more correct and controlled messaging.


w/r/t mem targets, we also will need to consider target node size too. With lower mem we can have more nodes, and so any churn is theoretically less. Anyone wanting to take advantage of spare resources could just start more…

5 Likes

A small point, but if you’re using the term “average home computer” to not refer to the average home computer but to refer to a computer bought new within the last three or so years, then this is perhaps a reasonable average. I don’t see good grounds for assuming that the majority of home computers actively being used were bought in the last ~3 years though.

It might seem like that to people who live in the mid to upper echelons of wealthy countries, or who spend a lot of time around tech advertising, or who themselves have lots of new hardware. The statement might be true for example if we take it to mean “average home computer of people on this forum”.

Anyway, I’m not weighing in on your general point. It could still be a good temporary decision to be more demanding of memory in tests, if necessary. I just wanted to point out that 1GB or 2GB of RAM seems like a reasonable guess to me of how much spare RAM “most” computer users could spare, if we take “most” very literally and imagine all the computer users outside of the richer pockets of Western society.

7 Likes

I agree, but, could we reasonably assume that early users of the network will have above average machines?

As the network evolves and grows it’s user base there is time to tackle these issues.

5 Likes

If you make that assumption it becomes self fulfilling.

One of the fundamentals is that Safe is for everyone. For testnets this is not as important but we don’t know when people will arrive or from where. So it is still potentially important.

9 Likes

Is it really though? Consider the case where one or many operators manage 10,000 to 100,000 small nodes from 1 public IP and then lose connectivity compared to a those operators that manage 1 to 10 large nodes…

I agree that the barriers to entry for running a node should be fairly low, but I’m not convinced that running more than the number of processor cores on the host ip is worthwhile

Granted, you may have convinced me I’m wrong since we may very likely have many thousand core processors in 20 years…

Or maybe I’m right and one needs to look more closely at the number of nodes per IP address as the only constraint…

4 Likes

Well that situation is the same as running one large node, more or less… churn wise. But if smaller operators can run nodes, then everything will be more spread out across machines across the glabe. We can’t stop one computer going offline in either circumstance. We can work to enable more operators on older hardware, so make such data concentration / machine less normal.


I think CPU cores/ perf etc is a separate question. There may be some benefits to having larger nodes in a data /cpu perf sort of question… I’m not sure it outweighs the benefits of more accessible node operation though?

4 Likes

Also the scenario of that many cores is not as realistic as moving to physically smaller and smaller PCs but more powerful. Also the move is towards more powerful GPUs since the PC market is moving towards more graphical environment.

The core transistor count is approaching a limit and for a while now the move is to more cores to gain more compute power. Once again there is physical limitations on the number of cores in terms of both size/heat and also bus management.

It is more likely that there will be a move to distributed computing within one PC and in the household.

This again raises the idea that many nodes on one connection can see a large number of nodes disappear at once.

On the surface this sounds bad, but when you consider that its not a singular thing, but the operators of nodes world wide will be increasing their node count we see that its not much different between the average nodes being 2 or 3 per operator but now 2000 to 3000 nodes or even between 20000 and 30000 nodes per operator.

That is when an operator’s connection is lost (internet and/or power) then there is a corresponding number of nodes to store the data on. Be it a few nodes per operator or thousands of nodes per operator. Also the general internet connection has to support the traffic too and we’ll see in the general case that its all proportionate.

3 Likes

Yes, and lots of small nodes with a total of say 50Gb is the same data churn as a large node with 50Gb, BUT the small nodes cause churn across many groups, and those groups only need to relocate tiny amounts of data. Therefore the churn event is handled much quicker and smoother.

In addition, the larger the network the more stable it is and the more secure it is etc.

15 Likes

Yeah, that was the point I was trying to make. How one defines a ‘large’ network is important (node count, Exabyte Stored, unique IPs, etc.) ‘Small’ vs ‘large’ nodes is a moving target based on latest tech. Minimizing node size is de facto maximizing the compute usage to ensure smooth operations. This is a clever trick at network scale but this also means maximizing energy consumption. Tradeoffs. It seems like unique IP count (and/orr aggregate bandwidth) are more important indicators. They are often not explicitly discussed…

For example, demanding a minimum bandwidth per node…

4 Likes

I was going to say this before, then a thought injected itself to that… I didn’t have time to walk it through at the point, but wanted to put it here for logic errors/clarity to be found, so I circled back. I’m just starting it here, we’ll see how far I get:

Comparison of effects of churn

A. 1 machine of 1 node of 1TB
B. 1 machine of 1000 nodes of 1GB

In case A, as that machine goes offline, a new machine/node takes its place. 1TB of data is streamed over from the n-1 others in the close group. Optimal case is that they partition the range of 1TB into n-1 parts, so that each node is streaming over unique chunks to the new node, and maximises the parallelisation / minimises the waste of BW.

In case B, there are - optimally - 1000 distinct close groups where a new node comes in to take the place of a lost one.
The pattern of replication is same as above, it’s just 1GB out per close node now, instead of 1TB as in case A.

In a simplification, we can say that the new 1TB machine joining, got one 1GB node in all the same close groups as the machine that left.
In an analogous simplification, we can say that the remaining nodes of these close groups, are all run by the same n-1 machines.

In reality, that will not be the case, but the compound effect will perhaps be the same (or…??) as there are multiple machines leaving at any given time.

So, by that, any given machine is still replicating the same amount of data per churn, both in case A and case B. It’s just that in case B, it is done in multiple close groups at the same time.


Now that is with the simplification that there are the same number of machines in both cases.
But in reality, case B would be a network of higher number of machines, as there would be also many machines of lower capacity.

The question though is if that as well doesn’t just even out in the end?
Each machine would see a replication load proportional to the num nodes they run. So, do the 1TB machines see same replication load in both case A and B?

I think there is something about the machine count that affects this, in both ways. The larger machines are probably more stable (simplification!), so they churn less. The smaller machines by that churn more (simplification!).
Could we then by that say that there is roughly a constant amount of data churning at any given time in a (sufficiently large) network, regardless if it consists of few very large, a mix of very large and small, or many very small?

I don’t quite have the time right now to delve deeper, to wrap my head around those Q:s, but instead of postponing the idea I’ll toss it out here now for disection.
I’m not looking for an argument for any particular way, just trying to formulate possible conditions.

(Oh, also just to note: there is probably an overhead, whether significant or not, for each node, in that there are some duplication of bootstrapping msgs and alike. May or may not be relevant in the larger picture.)

11 Likes

The common denominator here is bandwidth. For a fixed pipe size both machines will fill at roughly the same rate. So even though case B spreads out chunks across more sections, that doesn’t necessarily mean the small chunk will get stored any quicker. Option B also has way more aggregate communication overhead.

Instead of looking at ram or processor cores or disk space, demanding a minimum bandwidth per node or fixing/regulating the bandwidth per node can even the playing field. A 1MB chunk per second per node delivery rate seems like a nice lower bound imo.

9 Likes