Network efficiency and performance improvement algorithm proposal

dxtr · June 2, 2021, 8:25am

I would like to ask more knoledgeable members of this community for some explanations regarding questions I still find valid after reading Primer, lots of SAFE related high-level articles on Internet and various topics here.
I have read about XOR distance and addressing of nodes and data.
Also I am familiar with Sections and the assumption that there is no notion of time in the SAFE network.

Recently there was a post that current Sections are based on 7+7 Elders and Adult nodes and when a section grows larger it is split into two (maybe more?) together with data chunks - those now are divided between the new sections.
But again these new sections have a limit of 7+7.

So right now there is at most 14 nodes in SAFE network serving particular data chunk.
Let us assume that the network is mature and we have nodes spread all over the planet. Roughly we can assume that there will be two nodes on each continent, and then maybe 4 in Asia (I assume we will have to wait some time for Penguins to connect :).
Under this assumption and current purely mathematical node distribution scheme each data chunk will have to travel on average 1000km to reach a node that desires it (maybe it is 500km or maybe 2000km - this is not the point I am trying to make here).
It seems like a lot of traffic to me.

On the other hand we have data chunks. And those af first sight seem all equal, but they are not. Here is why. There will be users that store their family photos and access it only during family gathering like X-mas and birthdays. There are also data chunks that will be massively popular, like Youtube’s Gangnam Style. So in family photo case it is fine to have only 14 hosts serving each of those photo collection chunks. But when there is an avalanche of requests for the video, there is high risk of a random set of 14 nodes not being able to withstand even a short request peak making the network feel slow and offputting.

What I think might be worth considering is letting those sections adjust their size autonomically depending on some usage threshold.
So if nodes in a section get overwhelmed by the number of requests they get such a section could be gradually expanded (to a limit I suppose) so that it can easily serve all those requests.
Nodes in that section could send Need_Some_Help_Here message, or maybe DataManager could trigger expansion.
Once this request burst is over such a section can be shrunk gradually, up to a point in which current requests can be served with not to much sweat from section nodes.

And what is also worth noticing while shrinking section could remove those nodes that delivered the least amount of data chunks, and keep those which delivered most of those. Some sort of natural selection of nodes.
This would also mean that the new core section nodes are more performant, and possibly closer to requestors, making the network more efficient.

Can you comment on that?

happybeing · June 2, 2021, 8:40am

It’s an interesting idea to adjust section size dynamically. It’s not clear whether the reason you are thinking about this is an issue or not.

One of the ways the network tries to deal with high demand chunks is by caching data at intermediary nodes, but I don’t think this is implemented yet.

Also, be aware that increasing the size of a section will increase the number of messages between nodes. I’m not sure the current section size will be final, the tests will help understand if there’s a sweet spot.

I think the idea of self adjusting is interesting and might well have application, and it’s a nice point about natural selection.

dirvine · June 2, 2021, 8:42am

This is a minimum value and not a limit. I suspect that has you down the wrong path.

dxtr · June 2, 2021, 8:49am

It well may be so, I will try to dig into the source to have some better understanding of SAFE’s inner workings after having completed a Rust handbook.

Thank you for reply!

dirvine · June 2, 2021, 9:05am

No worries, just shout, lots of help around as it’s a big topic and moves quickly in last year.

neo · June 2, 2021, 9:36am

The limit is a lower limit. A section will grow to at least 28 before splitting, usually more to ensure two sections can be formed with the lower limit of 7+7 for both new sections.

This was set low for the test nets so that section numbers will increase quickly but still have a robust network.

My understanding is that at the moment when a section is filling up it will be accepting new nodes quicker than it has plenty of free space. Thus the fuller section will split quicker than emptier one.

dxtr · June 2, 2021, 10:28am

OK, so here is how I understand this:
For simplicity let me assume that a section is a barrel that can hold water. Water in this case would be datachunks. In this analogy nodes could be wood pieces that make the barrel.
If there is a barrel with water that reached some high-water mark, then such a barrel accepts new wood faster in order to divide itself into two, so that no water is spilled, and half of it goes into new barrel.

I do not understand how this mechanism solves the issue of high demand, i.e. when water level is fine, but there are dozens or more people trying to scoop some water simultaneously, but the barrel can serve only half a dozen at a time. And without that water immediately they will become expired, so they can not stand patiently in queue. (By this I mean keeping QoS high for time sensitive services.)

I posted this in Beginners, because I know this is most likely just me being the ignorant problem here

dirvine · June 2, 2021, 11:12am

Yes

It’s related to hosting data but not totally. I think you mean many clients Getting or Mutating data (mutate means adding data in Safe)? As sections split then there are more access points so less clients per access point. Lot’s of client activity means more data which means more splits and so on.

Not at all, answering questions ensures our knowledge is current. I try to when they are put kindly and respectfully as you have. So all good

dxtr · June 2, 2021, 11:39am

Ah, thank You @dirvine , I think now I get it.

I’ve been thinking that each section contains datachunks that are stored only in that particular section, and nowhere else, but now I can see this is not the case - there can be multiple sections each holding a particular datachunk, so when there is lots of Get & Mutate data requests those requests can be split between multiple sections, and even new sections can be forged in order to cope with current demand.

In other words when there is a section split then there are (or at least can be) some chunks being stored in both those new sections simultaneously.

neo · June 2, 2021, 11:51am

@dxtr
Although you did bring up the point of particular set of chunks being popular. Now these could be from any number of files that they are a part of, which doesn’t matter, what matters is the chunks are popular.

The network will have a caching scheme in place where popular chunks are cached in other nodes on the return path for the chunk to the various clients requesting the chunks. So very popular chunks will have a mitigating mechanism already. Not sure how the return path is done for the test nets, but previous it was O(log(n)) nodes in the return path for a client but maybe less now.

This cache will reduce the load on the section(s) and the set of nodes holding any of the chunks.

But this is not going to cover moderately popular chunks and statistics says that some sets of chunks will be much more popular than other sets. The set of chunks representing the chunks held by the 4 (or 8) nodes.

Wondering if there would ever be a need for sections to monitor load on any set of nodes and mitigate it by recruiting new nodes to add to a set of nodes loaded down. So instead of the minimum of 4 (or 8) nodes for those chunks there is now 5 (or 9) to reduce the load a little.

The test nets will be able to inform us if there will be an issue in this are

VaCrunch · June 2, 2021, 12:00pm

Some questions that occur to me regarding the caching of popular chunks:

So a farmer’s hope is that he is hosting a node that happens to be a cache? Does the cache mechanism cause any difficult decisions in the reward algorithm? Are they treated differently, as far as rewards go? Will there be a way for a farmer to know whether or not he is hosting a cache? Will the reward that goes to a cache node be the same as the reward for the original hosting node (where the requesting data first resided)?

I realize we might not have the answer to some of these questions right now. Just ruminating about the overall scheme.

dxtr · June 2, 2021, 12:06pm

Having one “free electron” node could be used as a way for replacement of poor performing nodes from a section.
Next node from XOR distance queue could be invited to verify whether it is a better fit than any one of existing nodes in current section set, based on # of chunks served.

neo · June 2, 2021, 12:37pm

Well as part of a Node’s responsibilities is to pass on chunks as they hop to the section where it is stored. And also when its retrieved and being passed through to the client. The node could be part of the section the client joined to or an in between one.

The cache simply means that the node sends the chunk from its cache rather than to be passing on the chunk if it was not cached. It did it once in order to cache it and would do the passing on every time its used if no cache. Eventually the chunk is held by nodes everywhere if its that popular.

So really it is no extra work by returning from cache or passing it on. Actually less work.

Now no node has any advantage since all nodes will be maintaining this cache.

In the end the cache is just a part of the work the node does because it is a node. No rewards needed.

The rewards from GETing data is to cover the whole operation of the node.

urrtag · June 2, 2021, 2:44pm

Ive ever thought that the cache is really stupid/simple, you just configure the amount of cache your node is going to have. There are no rewards for caching (or delivering cached data), but you reduce your bandwidth/traffic needs. Every cache hit reduces the needed traffic by half, as you now need not to first download the data from an other node and then send it to the next node in the chain, but send it right away.

Does this imply that the space that is used by the cache is a predetermined fraction of the total storage? And the caching is deterministic (= other nodes know which chunks are in your cache & also know if you’re going to add a passing by chunk to your store)?

VaCrunch · June 2, 2021, 2:58pm

So, if a large portion of Gets is cached, does this mean a large amount of Gets will go unrewarded?

Traktion · June 2, 2021, 4:44pm

Just on the caching, from what I gather there are decentralised networks which rely very heavily on the sort of caching discussed above.

While I do not have a deep understanding of IPFS, I gather that popular data would frequently come from the cache, rather than where the data is ‘pinned’. The host pinning the data acts more as a seed to fill caches on other nodes, rather than serving all requests.

I believe the likes of Freenet behave similarly, but it doesn’t have a concept of pinning or data ownership at all, with all data being essentially cached and stale/old data just drops off the network.

Safe Network with caching enabled should get similar benefits to performance and load balancing, but data will never be discarded. As requests are routed towards where the data is owned/persisted (based on XOR name), then the impact of caching should fan out well. If nodes ‘close’ to where the data is persisted cache the data, as I understand it, they would immediately take substantial load away from the nodes who owns the data. Caching of the data by nodes further away from the owning nodes will also improve the overall speed of the network.

neo · June 3, 2021, 12:36am

If not preset, it will reach max size once that many chunks have passed through to fill it, the algo to remove chunks to make space for new ones has not been chosen yet until testing has been done.

So yes at first the cache maybe empty, but this is the case for all new nodes and the cache fills from there. In essence no advantage for any node in this, the difference in time for the cache to first fill is really immaterial as to “advantages”.

Also the nodes that supply the chunks from cache will keep the chunk longer than nodes that do not.

So in a particular path back to the client. generally, the nodes will cache the chunk. Now if that client keeps requesting the same chunk then the node closest to that client will keep supplying it and the others will see the chunk fall out of the cache (eventually) since it is not used.

Closer to reality a popular chunk will be requested through many different paths and many nodes closest to various clients requesting it. So the nodes holding the chunk in cache and supplying it will be much greater in number.

Yes.

But for that to be happening means that the majority of clients suddenly want to be accessing the one set of chunks.

Maybe when the Mars landing is streamed live then those chunks could become the majority of chunks requested. But the nodes storing them could not supply the chunks anyhow so caching saves the network from being brought down.

For a network grown past a baby network I doubt that one set of chunks will ever become the majority of chunks requested. If this is seriously thought of happening then there would be cases of sites on the current Internet regularly brought down by the majority of internet traffic going to that server (farm). I do not know of any case where the majority of internet traffic has gone to the one server (farm)

mav · June 3, 2021, 1:19am

@dxtr great questions. It sounds to me like you’ve observed the network can scale according to stress in storage, but there’s no way to scale according to stress in bandwidth.

Your observation sounds correct to me. Currently the network only measures how many nodes are full and further scaling is based on that metric. I would be glad to see more measures of stress included in the scaling and management of node membership, such as bandwidth and computational speed. Hopefully in future developments we will see this happen.

I do not understand how this can be, considering how Secure Message Delivery (SMD) and Reliable Message Delivery (RMD) works. I can’t see how a response from cache can reduce the workload for other nodes. There are a few reasons for my doubt:

If only elders are involved in hops, how can adults know that there has been a request for a chunk they are caching? The hop message never touches adults on the route.

In RMD, hop messages are sent by multiple nodes to multiple nodes. How can this parallel chain of messaging be stopped by a node responding from their cache? There’s no way to ‘catch up’ to the relayed messages and cancel them. I don’t understand how caching can reduce the workload, since it would require some way to cancel the hops that may have already gone ahead of the node delivering from cache.

Nearest-neighbor routing has a different route for the client-to-chunk request compared with the route for chunk-to-client response. Similarly, a cache response halfway along the route will have a different route than the route from the chunk section. How can we prevent nodes from doing work when there’s no easy way to intercept the response route?

These are definitely not beginners questions, so sorry if I’m derailing the thread!

neo · June 3, 2021, 1:49am

@mav When caching is implemented then a mechanism for this to occur has to be implemented, doesn’t it

urrtag · June 3, 2021, 10:38am

My actual question was (or should have been): is caching enforced by other nodes? If so then there needs to be a guarantied cache reserve (say a fraction of the nodes total storage) & the behavior needs to be verifiable by other nodes, thus deterministic.

By not enforcing there’s only one incentive for the operator to enable caching: halved traffic for cache hits. This implies:

that the caching algo can be tuned by the operator, thus reaching better cache efficiency by allowing experimentation and the ability to adapt to changes than an enforced static algo
- running multiple nodes in the same environment could make caching more efficient by sharing it across all the nodes.
the op might disable caching if hes not willing to pay the higher cost in storage for a lower cost in bandwidth (causing more stress on other nodes)

Topic		Replies	Views
Proposal for network growth based on node size rather than node fullness Features	95	2825	January 4, 2022
Fast Ephemeral Routing RFCs	11	1508	July 19, 2020
Questions testnets may be able to answer Development	28	1569	February 10, 2021
Rewardable cache RFCs	10	915	May 11, 2020
Gaming farming rewards - cost for GETs Features safecoin	130	3843	July 1, 2019

Network efficiency and performance improvement algorithm proposal

Related topics