Latest Release March 20, 2025

Idk @Toivo not sure anyone had/has all the answers yet.

I think there was recently talk of nodes turning on and off pretty frequently/daily.

So my guess is this was not forseen.

6 Likes

Maybe it is helping by showing up issues that need to be sorted in order to scale that would have gone undetected at a smaller scale?

Or, maybe not :slight_smile:

5 Likes

Maybe not, but this sounded to me that it was pretty evident to David:

And I get it that it may have gotten lost and unthought given his absence. But then my criticism is this: there should be more people with deep enough understanding of the core of the network, Kademlia, LibP2P, etc.

Now that it seems that so much depends on one person, it is very weak for the project.

1 Like

I think at around 20 million or even less David said something about it possibly being the most nodes in a distributed network ever of any kind?

If i remember that correctly, at 50 million we should expect unknowns.

Then if we consider that our networks imploded at 3k not incredibly long ago it is a fantastic achievement, we just need to zoom out a little.

11 Likes

True. But the way I understand what David said is that even on ideal conditions the older nodes would have an advantage.

3 Likes

Churning is supposed to update tables so that in the routing tables and bootstrap file the closer nodes kick out older nodes that are further away. So no its not quite working the right way. Also shunning needs to kick out addresses that no longer have a node at the IP:Port/peerID

5 Likes

Until we have a network that delivers useful functionality, IMO its size is not a measure of achievement.

For a year now we’ve been expecting that functionality would improve and apps would follow. But here we are now with an enormous, but still useless network.

We will need size and functionality.

7 Likes

If we consider that it is still holding data that was previously uploaded, that is impressive. When the network was smaller, it was also pretty responsive too.

For me, it feels like we’ve had a taste of different elements, but not all at the same time. As it is pushing the boundaries of what has been done before (the impossible network), that is surely to be expected.

Given large uploads are a composition of smaller uploads, I suspect that is solvable too, one way or another.

The emissions setup feels like a misstep that we’re trying to shake off. Hindsight is 20:20.

We’ve got this far though and I expect the team will solve the problems occuring at scale. I’m more than happy to cut them some slack on how much my new or old nodes may be earning, personally.

9 Likes

Same here… partially because I’m currently raking it in :laughing:

Seriously though, I hope the emissions get properly distributed & issues with nodes not seeing each other, and not detecting nodes have left the network get sorted soon so that the network’s performance improves.

4 Likes

With respect @Toivo I think that is unfair on the rest of the team, some of whom who have been involved for many years.
Not everyone is a polymath or poly-genius like David but I think most of the long-termers would by now have a fair grasp of whats what.

BUT - we are sailing in uncharted waters here, even with the Great Helmsman back doing much of the navigation , that does not mean its straightforward or progress will be linear.
Has any team ever needed to dig so deep into Kademlia?

6 Likes

I’m not saying anyone should be what they are not. I just feel that ever since the networking component was left out (You remember Crust?), that aspect has not gotten the attention it would perhaps need. It seems that LibP2P, Kademlia, and what have you, has a lot to know to get them working properly.

In my opinion 50M network size is on the low side of what is to be expected. The unexpected thing was that we got to it so quickly. That’s why I think that these things should have been worked out on a theoretical level in advance.

Another reason why I complain is bus factor.

The bus factor (aka lottery factor, truck factor or circus factor) is a measurement of the risk resulting from information and capabilities not being shared among team members, derived from the phrase ā€œin case they get hit by a busā€.

David was nearly ā€œhit by busā€, and while he will always be irreplaceable, there should be more folks in the team with overlapping capabilities. As far as I see, every initiative to look into Kademlia etc. has come from him only. Though I may be wrong here.

And in general it seems to me that much of the solutions are relying very much on the results of empirical tests, and not so much on the theoretical ponderings. While the empirical reality of course dictates what works and what not, the theoretical side could give some ideas that would take ages to get to with tests, without that theory.

Anyway, after all this, I may have read too much into what David said. Maybe the node age is not going to be much of a factor, when the network starts to have data:

But I would be interested to know if there is some theoretical insight into how this thing plays out when the amount of nodes approach infinity? Surely it must take some time before a new node is seen in an equal basis compared to old ones? How does this time behave in a network of 10 billion nodes? Can we then say that node age does not matter, or should we say it does?

3 Likes

I think there is confusion here. Node age is not like we used to have. Now it purely refers to the age of a node in that

  • How long it’s been on the network
  • How long it has kept its address (not restart but upgrade etc.)
  • How long it has behaved.

In these pure p2p networks, which is what we are trying to do, so no half way house, but pure p2p there is a lot of stuff we cannot see, even experts in the field. I did a complexity theory presentation this week on ants (of course) and the relation to the network and its relation to natural systems (no Rolex’s and no IF statements). It’s a deep deep area where you essentially have

  • Very few functions
  • All functions are event driven (not periodic but stochastic)
  • No conditional logic to enforce more efficient rules

When you dive into that, you also have a neural network. So think like the Tesla self drive thing, they removed the 300,000 lines of c++ code (conditional logic) to replace it with 100% neural network (stochastic small function based). Also think of a neural net as a network of perceptrons where the inputs with bias define an output, so kinda like an analogue type system in a digital world.

When you take all of this together it means we get more natural. So economy aside as that right now uses a blockchain, but I don’t care, I honestly don’t. The network infrastructure and data types are what we need to be correct first, before anything, and these are simplified way more in the last few months and will do again significantly. I think we are close with data types to where we need to be, but we have some clean up and removal of conditional logic a Rolex watch stuff in networking.

Libp2p has done a good job with a kad implementation but pre split the buckets, that leads to a magnetic approach to a particularly more deterministic and calculated approach to handling the routing tables. This is a dangerous place to be. Again the thing nature tells us, is not to optimise, but rely on error, that error is really really important to have (like the ant that walks into the night, or get’s drowned in a raindrop etc.).

So deeper it goes :wink:

Then the looking at the network and saying

  • reboot every so often for more money
  • don’t reboot for more money

And all of these statements, which I do ignore as it’s like saying stop the ant from drowning, give it flippers and a snorkel etc. It’s a bit mental.

So there are some statements we can make and these can be conflicting

  • A traditional KAD network will favour older nodes in the routing table
  • Older nodes will be seen in more queries statistically

However old nodes can poison a routing table (we currently have this issue, we believe, but again, few functions, less conditional logic, it’s not a silly shove ā€œifā€ statement type solution). So we need to consider dead node notification and resolution, but in a way that is not detrimental to other parts.

Then it goes deeper. so eventually and hopefully soon, we have a KAD implementation that is super super robust and incredibly efficient an handles huge networks (interesting, if you look at a 6million, 30 million and 60 million node network, the hop count is interestingly similar, scale is inbuilt here for much much larger than millions of nodes, we lose sight of that too often).

So knee jerk reactions to ā€œfixā€ X will cause issues in ā€œYā€ and so on can be very very dangerous, I posed this question at the presentation.

Look at an ant colony and I will give you a single IF statement, you decide what rule you will add to improve it!! I can almost guarantee we would kill the colony, just that one single rule or regulation we impose with that single if statement. Nature took over 4 billions years to work out the basic functions, they have worked for 150million years, we are unlikely to be able to improve it in a 30 minute meeting with the best folks on the planet working out where to put the IF statement.

So yes, older nodes will be more well know, obviously, like anything we can think of, it’s not new knowledge, but it’s a subtle fact that can be misused and not a weapon to work in a tight viewpoint, like can I earn more if I do XX.

The network should become so simple it’s eye watering and the ability to work in all aspects of human behaviour should emerge and I can see that is happening, but gaming it should be horrifically difficult as the secondary effects of tweaking or adding conditional logic should be catastrophic for the attacker and his or her nodes.

In essence, the bottom line would be the network will work better for everyone who just runs nodes and the longer you run them the more they will earn in terms of cumulative earnings and current rate of earnings, due to the rate of being in a a route.

I would say the team in house are loving all this, we are getting to focus on these deeper issues now and make deliberate, algorithmic alterations via algorithmic team focus groups and this will give us a much better network. So it’s no longer a single dev with some conditional logic change who gets some code to pass some test and push to production, we are now at the part where we have a proof point of a huge network that forces focussed deeper dives into the base algorithms.

We are understanding the complexity of the ant colony, because we can see one here and now we will make improvements to ensure a robust effective network. The team are motivated to make such improvements, but the thinking is very very deep and not instant.

17 Likes

Is this the fork and fix that you guys alluded to?

4 Likes

Hey thanks for taking time to write all that. I think and hope it serves the curiosity of others too, not just me.

I don’t have a beef about any single way the network is designed. It’s just that the information directed at general public should reflect what is actually going on. So even though ā€œnode ageā€ is not a parameter in the netwokr the way it used to be, the age of the node is something to consider when running nodes.

Also, I think most people here have been thinking that data would be spread more or less evenly between all the nodes, but now I think that the older nodes would be fuller. Am I getting it right or wrong?

6 Likes

Don’t forget how the team is constantly striving to fix this or that and meet deadlines.

They used to have a day each week to do their own thing. IMO that was valuable, not an impediment. Here’s why.

I don’t have any external deadlines but still find myself pushing to get to the next thing. I have to force myself to have at least one day each week where I don’t do code or anything associated with it.

Later in that day I find my mind sneaks a few minutes thinking about something to do with my project, but isn’t necessarily what I am ā€˜supposed’ to be doing for the project.

Time and again that’s when I realise something valuable. Frequently a new way of doing something, or an opportunity that changes things radically for the better etc. Most recently it was reflecting on something David had said which annoyed me.

Commenting on Josh’s app David suggested problems reported with the network were due to badly written apps, unlike Josh’s that (on his system at least) was working at the time.

That wasn’t justified and didn’t fit the evidence, so I felt hurt and annoyed and it kept bugging me.

Only on my day off, did I wonder, what if there’s some truth in that? Maybe my code could be improved, but how?

The result was a simple but difficult idea to implement: keep calling each Autonomi API until it succeeds. I wondered about using Rust in a way I’ve never considered before and didn’t know was possible, but maybe, just maybe would work. And if so, would make the simple but difficult to implement idea, quick and easy to try out.

I’m amazed it went from question to working solution so quickly. Only possible because I was able to step back, let my mind relax and ponder aimlessly.

Aside for Rustaceans: the clever bit was to using a generic typed function with a closure. With generic types for both parameters and the return value. How could that possibly work?! Well to my surprise it does, and the Rust compiler showed me how. That enabled me to quickly wrap different Autonomi API calls in a function that calls a closure. The function can then call the API until it succeeds and return the result. Doing this without such a solution would be a horrible, error prone mess.

Unfortunately it didn’t solve the problem, because the Autonomi API calls are failing far too frequently (even from VPS) and there’s nothing I can do about that. But should they start to succeed even a bit, this will make my app much more reliable - and others can reuse my code to do the same.

This ability to step back and ponder aimlessly is hard to achieve in a business because there are stated goals, internal and external time pressures, feelings of responsibility to each other, a need to contribute every week etc, but it can be done.

Without this things get missed. Opportunities and creativity suffer, and minds become closed. As they have towards the community which has been pointing out problems and solutions that are often brushed aside until they can’t be ignored. Even things that are to us, no-brainers or easy to check as we’ve seen recently.

It was apparent this was happening a year ago and concerns about that have been brushed off too.

9 Likes

It also affects the way the ā€œclosest 5ā€ will work as well. I tried to ping one of my nodes with a file designed to target the node. It took multiple quote attempts to finally get a quote from the node. This means uploading files will typically go to the older nodes rather than to the closest 5. Now this means more churning for the chunk to eventually get to the closest 5 in many cases. Or maybe churning won’t happen, I cannot be sure.

As I was told, this is an area that they are working on very actively at the moment and very likely to change over time. So not a criticism here, just observations that is probably only true for this short period in time

8 Likes

Excellent questions, even better answers, thank you to the community

All I can add is

I got attos, I got attos!!! /me does HappyDance

not emissions but actual attos, a chunk payment at last - thats the first I have seen so far.

6 Likes

To be clear the team had a day off and it was a holiday, it was not a do your own thing, it was a 4 day week and 3 day weekend. We are trialling this again, but it was not as effective as it could have been in many ways and ok in other ways. So not a simple one at all.

That should not be the case, the older the node should not affect how much data it holds. That is more evenly spread, but …

There is 2 sources of pain right now

  1. We are not convinced the close group is being effectively returned
  2. We are not convinced the dead nodes are not polluting and on occasion killing searches (there is an Alph and beta param, set to 3 and 1 respectively and these are the kademlia params).

These 2 issues seem to be real, but we are working hard on them to identify and fix, there are some great things happening to prove this quickly. So given these may be issues then we do need to fix Tham and that could cause the issue you report here @neo but a bug as opposed to a design issue with older nodes.

Soon, soon that will be where the biggest rewards come from

What I posted was a very very high level overview of the thinking. Even the dedicated team need a ton of presentations and meetings to describe this. It’s beyond us to put it in easy to digest messages, so we do try and Bux and the team there do try very hard to stay very high level and get direction. So messaging is hard as hell, I could not do it, but we need to get direction and keep it. That comes from payments and rewards and there is a team working unbelievably hard doing that while the Engineers are in the background trying to give them something to build on.

So we have some folk taking slack for others to fix stuff to allow the team taking the slack to be able to move forward and it’s all happening in real time and we are trying to explain what we can when we can. The weekly updates do try and shine a light on that weeks investigations etc.

It’s no simple task to explain this ant colony and all its parts :smiley: D:

11 Likes

But how thoon?

when attos? :slight_smile:

Really want to hear whats going to be said about emissions tonight…
I can only start to imagine your frustration here, David.
And Im probably not helping.

Just tell us what to poke at to see if we can help.

2 Likes

It’s just around the next corner, now eat yer ice cream :smiley:

6 Likes