+++ Beta Network is Now Live! +++

I am busy over the weekend and a few days. I haven’t even registered on github

1 Like

Would you be upset if I pasted them in?

1 Like

Too late - already merged :frowning:

1 Like

The relevant range is far wider than the close group range, which means those un-relevant chunks shall no longer be useful again, unless there is a disaster shutdown large percentage of nodes.
And the shrinking of relevant range is also not a common sceario to happen

Hence, the concern is not that effective.

ps. also replied the PR comment

1 Like

From what was explained to me by a couple of devs is that a node has both active and inactive records. This is shown in the logs and /metrics.

The active chunks is used for store cost calculations and is what the node is responsible for. Each record in the active records has 4 other nodes in each of their close group.

I understand the reasoning in that the faulty node thought it was one of the 5 close group nodes for too many chunks that it really wasn’t and the PR will fix that

But the unintended consequence of deleting all the records that the node is not responsible for means there will be zero inactive nodes. Now the bit I was told was that these inactive records can be activated again if another node near by leaves and the node becomes responsible for any of those inactive records thus saving a lot of network traffic during churn when a node leaves. This PR according to the description wipes that feature out and the savings are now lost.

If you only removed completely the records nowhere near that node, like not even within the neighbourhood then that is a good thing, but if the record is only a couple of steps away from being in the group of chunks the node is responsible for then it would be good to keep it in the inactive list (if there is room). Otherwise deleting them all means we lose a valuable feature

1 Like

that’s the un-relevant records that got cleaned up.

not exactly. as the relevant range is much wider than the close group range, there is high change that an active records will have more copies to be scattered across nodes.

not exactly. the fault is from too many nodes in the relevant range , not the close range. there is huge difference between two.

you mean inactive chunks ?

With a healthy network, this shall be rare to happen.
This will happen only when certain percentage of nodes got dropped out at the same time.

In a word, the relevant (i.e. active) records only affects the store_cost calculation.
Most of such records are many steps away from the close group records that node is really responsible for.
Hence, a small churning won’t have an effect on, even with all unrelevant (i.e. inactive) records removed.

1 Like

The PR will delete all the inactive records so only the 5 nodes will hold any record

yes, just a typo

Every time a new node enters it will gain records that are close to it. When it or another node very close to it leaves then some records will now have to be copied from another node. Before the PR with a healthy network there was always a good chance that many of these records were just sitting in the nodes gaining records inactive group.

But with the PR all the records being churned will have to be copied to nodes because those nodes removed all their inactive records. Basically there is no such thing as inactive because every cycle (hourly?) they are deleted

nope.
way more than 5 nodes will hold the unrelevant records.

Nope, the records supposed to be shifted are mostly relevant records.

Giving an example now, say a node has 5 close group nodes in bucket 240 (K_240) and 10 chunks in the same bucket as well, then another 5 nodes in bucket 241 (K_241) and 10 chunks in the same bucket, and so on.
The relevant range will be say 3 buckets (much wider than close range), with means buckets further than K_242 (starts from K_243) will be considered as un-relevant .
When a pruning as in the PR happens, there will be 30 chunks remained.
When other node of K_240 - K242 dropped out, this node will just become close_group to one of those 30 chunks (i.e. no need of copying to save traffic)

Maybe you do not see my major concern here @qi_ma

Yes i missed that this only happens after 60% full.

But in any case the node will at that time will remove all records that it is not responsible for (not one of the closest 5 nodes to that record)?? Or are you saying that it will only remove the records not in the closest group of buckets??

I must be missing something

please read my above post in detail.

relevant range is much wider than close_group range (5 nodes).
It is normally being buckets away.
expanding of relevant range, i.e. more buckets included, rarely happens with healthy network.

2 Likes

I did read it but being uneducated about what a bucket is, I decided to clarify by asking that one question. I am gathering that the 2nd option to my question is the more correct one.

Thank you for your time @qi_ma

1 Like

ah, sry, bucket means those addresses bearing the same common leading bits.
given the address is 32 Bytes (i.e. common leading bits ranges from 0 to 255), K_240 means they have 15 common leading bits (255 - 240 = 15).
A higher bucket number means nodes/chunks within it are further away from the node (from its perspective).

Hope this helps.

5 Likes

This is fascinating stuff.
Anybody got time to do a diagram/animation?
Then I might be able to convince myself I’m starting to understand it.

1 Like

Not sure quite what you mean, but this is a helpful explanation of Kademlia. Watch this video! :) - #1414 by JPL

2 Likes

Basically when the node is 60% or more full it will remove inactive records that are too far away from its xor address.

It determines how far is too far by using the buckets. For this small beta network the buckets tell us that distance is quite big. In a big network the buckets will represent a smaller distance. But same number of nodes near by

Remember back when sections were named by the binary prefix it would have. And when splitting an extra bit was added to the name and the two splits were original name plus the extra bit 0 & 1. Buckets are like that.

Another way to look at buckets is that the nodes in a bucket represent a portion of the network and these nodes have xor addresses starting with the same bits.

Thus the ones in your bucket are your close neighbourhood. The buckets one or two removed are your wider neighbourhood

NOW to the guts of it. Only the records not in your neighbourhoods will be totally removed since there is no way those records can be of any benefit at all.

Although one point I want to re-clarify @qi_ma if that the store cost algorithm is only using the active records which I understand to be the records that the node is responsible for which I understand to be the records that the node is one of the closest 5 nodes. (if not then the store cost should only be using the record count of the records that the node is one of the closest 5 to)

8 Likes

Well, the entire buckets_list (can be considered as a roouting table) is from one node’s perspective, i.e. different node has different buckets.
There will be no node can be in the same bucket as to the buckets owner (you).
We can just say: those buckets close to the owner, are its close neighbourhood

well, active records are the node’s close neighbourhood (i.e. the close some of buckets) to be responsible for. which is normally much more than the closest 5 nodes

if reduce the store_cost range down to the closest 5 nodes, the quoting price will be more easily to jump high and low steeply, with the churning nodes easily affects the range. (also some extra computation burden as well)

7 Likes

1 Like

That explains a lot, and I need to update my understanding too, as I took the much earlier situation where discussions on this left me thinking active records were just the records where the node was one of the 5 nodes “holding the chunk”.

This of course raises a lot more questions like is the 5 closest really a thing when many more nodes are being considered responsible for a record. Active and inactive have changed meaning in my understanding. OR can you now delete an active record, if it is considered outside of the 5 closest, when you need to store a record that the node is in the closest 5.

Anyhow those questions above are for another day.

@qi_ma One burning question I have (spent days looking for the answer) that I haven’t found in the code is how to get the xor address from the peerID since its done when accessing the distance to your node. I can use network::address to get the pair “peerID - xor address” but cannot find the algorithm to get xor address from the peerID in the base58 form of 12… (or binary 256 bits of the 38 byte binary decode of it) Can you enlighten me or at least point me to the place in the code.

6 Likes

the libp2p::PeerId is provided by the libp2p, which has a function of to_bytes()
meanwhile, there is libp2p::KBucketKey, which can be contstructed from a bytes, and provides a distance function to calculate xor distance to another KBucketKey.
(you can check the enun NetworkAddress block in our code base to check how we use this wrapping enum to make PeerId and various address types can calculates distance to each other on the base of KBucketKey)

If dig further into libp2p code base, you shall see that a KBucketKey is a hash of the input bytes, which means it doesn’t care about the format of the input type.

6 Likes

So are you saying that KBucketKey is the xor address? From what i could see from the code and logs it is the xor address.

Is “to_bytes()” just making a vector of bytes of the 256 binary number?

Part of my problem is understanding what the “terms” are actually. Like if KBucketKey is the xor address of the peerID? Or is the xor address derived from KBucketKey

So for a peerID the KBucketKey is a hash of the 256 bits peerID number? And as above is this hash then the xor address of the peer. What would be the hash function used

Is there some documentation somewhere explaining these terms or is it all word of mouth

EDIT:
Ok after a little more investigation, trail/error, I seem to have the method to get the KBucketKey used. My problem before was that I saw the base58 decode of the peerID as a 38 byte that needed to have the 1st 6 standard bytes removed to give a 256 bit number, but in fact the KBucketKey value results from the full 38 bytes using the sha256 hash function

Thus for this network address

NetworkAddress::PeerId(12D3KooWDWHSuDSepv68DHtDrccKK5CZCjtswkBwAKb34W4cwoXC - 5a6d379de1464116ff56f74911c2446dfc3ada3df374f8fe32e95ef2e9c417ee)

the 2nd number (KBucketKey I believe) can be found by

base58 decode of the base58 form of PeerID (38 bytes) passed through a sha256 hash giving the 256 bit KBucketKey

So my question now is is the 2nd 256 number from the NetworkAddress::PeerId the xor address used in the network layout??

1 Like