i think you would have been fine @aatonnomicc the deadlock is in the node code so worst case, you could have helped lock it, but it’s good news to find it. As @joshuef says though we do have a better approach that can rid us of a load of code related to adults reporting data usage.
Okay bringing this down as we’ve lost a couple of elders to unresponsiveness/deadlocks.
Thanks everyone for all the effort here. Some great community resources coming along!!
Awesome work! I like this step-by-step approach, keep pounding it until nothing breaks and then take the next step
Making new feature instead of fixing old one.
Nothing changes.
At some point such approach will not work.
I don’t think the idea here is a new feature - it’s just a realization that there is a simpler & easier way to accomplish the same requirement of the network and removes a potential bottleneck.
If I’m understanding the solution being proposed here at all - which I maybe don’t.
I have to say that I’m with @Vort on this one; what about when churn comes into play? Would that be to much work for elders then? Do we risk to solve one problem and create a bigger one down the line?
The issue here as I see it is that the adults are going to be somewhere on the order of 60 in the section and its unreasonable that each elder will be storing similar amounts of data. That’d require the elders to either be storing (60/4) 15 times the data any adult is, or if spread across the 7 elders equally then just over twice the data.
Or is this storing different data?
Maybe just have the adults report their level of storage whenever a status message (whatever they are called) and include a field giving an indication of how full the adult is.
Why have back and forth reporting (and all that entails) if you don’t need to have it? If elders are storing an equal part of data, they know how full a section is, + they can enforce a given node size (by deciding when to allow joins). It’s a bunch simpler + neater IMO.
But yes, this is assuming churn works, sure. But if churn doesn’t work, it doesn’t matter if you have working elders as we’ve seen…
Anyway, we shall test it and see. It’s not a wildly big change to try out. And reducing complexity has been working wonders for us recently.
Testnets surely indicate change will be needed? You won’t find many prototype’s in production with no changes. That is a false premis you have there.
Some confusion here. We are looking at having much smaller nodes, so folk can run many at once if they want to store more. That allows more parallelization It also helps churn a lot.
So imagine this a computer goes off, it has either 1 50Gb drive or 10 5Gb drives. In the former, there is a single part of the address space pummeled with relocation data. With the latter, that is the same amount of data in total but distributed (or decentralised ) across 10 times the number of nodes.
Also, as nodes will be kept small:
- Elders can store data and know the size limit we want, so they can themselves realise when the network is full. Due to not having to store loads.
- Data is distributed in a more uniform way (more addresses more uniformity) and that addresses a point made earlier about data being spread evenly across nodes.
- We can have folks run much smaller nodes, maybe even on mobile phones (work still on node age area to help here)
So this change is, I feel, a strong and positive way to further decentralise the network and get to even more people.
With small nodes, we are also looking at section sizes of 200 or so (much more resistant to Sybil attacks then). Again it helps us.
The issue though is the suggestion would see the 7 elders having to store the equivalent of the adults. 60 at 50GB or 300 at 10GB still is (3000/4) 750 GB each elder now has to store. Isn’t that going to be a problem now?
Not in my thinking. Elders are single nodes so only hold 5Gb say. Adults may run many nodes and hold 100Gb, but they are all capped at 5Gb.
So think of nodes as tiny, elders are tiny, but folk can run many nodes if they wish to store more data, not just use a bigger drive.
That’s me describing ideas not precise again.
Approach of changing structure of program instead of solving specific problem is not changing. And I think it is bad. It is better to solve problem, obtain stable and correctly working code and then change structure in the next iteration.
For testing it is good.
For release version of program it is horrible.
Better solution at my opinion is to make subdivision at different level.
So each node will contain several smaller entities depending on how much resources user decide to share.
Then if they are just like an adult storing data then how will they be able to judge what each adult is doing since data might not be really random on the xor address, just close. Then we see one group of adults with 20% or 50% more data being stored compared to the elder.
Now statistically this imbalance for some sections will be worse, and across 100 sections it is fairly certain statistically that some adults will be storing 2 - 3 times the elder in the corresponding section
SHA3 / pub keys are random (not such thing etc. etc.) and will distribute evenly across the address range. So data will be pretty even per node and we will see that in testnets and more so in larger testnets. It can take up to 2000 nodes before you see it properly (early gnutella research shows kademlia like that).
Otherwise, we ask that the Adult that says he is full to tell us and for us to trust him (lot of code, lot of consensuses and then trust a single node).
The proposed change encompasses several concepts. It can be helpful to break it down and look at them one by one:
1) Elders store chunks too
Probably a good idea. Can help with response speed and puts to use otherwise unspent storage resource.
2) The network/sections no longer keep track of storage levels
A very good idea. Removes a lot of work from the network and a lot of code complexity that may not have worked anyway.
3) Reduce storage per node to be “small”
This is complicated and there’s a very good chance it might not achieve the intended effects. Let’s break this one down a little further:
a) What’s small? Is it 1tb or 5gb?
b) It could introduce a host of issues, including around bandwidth inefficiencies (depending on what “small” is)
c) It’s another way to try to accomplish what was initially intended with the obverse of #2 above. That goal was to regulate joining the network by conditioning it on when storage space was needed. Just as that turned out to be too complex, making storage small may turn out the same too.
d) Smaller nodes may allow for larger sections (e.g., 200)? That can be done regardless of the node size.
Why not go further with #2 above and have the network be also unaware of node storage size? If a node can’t store and provide the data that was assigned to it, then it gets booted. This would mean that there may be nodes on the network with 5gb or 10tb of storage and anywhere in between or more. Any section size can still be used (60, 200, etc).
But it leaves the question of how to regulate node joins, which can be similarly addressed by letting any node onto the network as long as it does some useful pow first (bringing back something from the old design; but has to be something that elders can easily establish consensus on without additional complexity).
Sorry but that’s a separate issue. If smaller nodes simplifies things, sure.
But in relation to Elders:
I only go by what was said in the past, when the decision was taken to have Elders not to store data. Although, as others, I didn’t necessarily agree, I thought it made sense if the load was to much for Elders.
If the optimisations that have been put in place in the meantime allow to bring back storage to Elders, happy days. But frankly I believe it’s premature; churn has taken down elders in previous testnets way too easly even without that, and that’s even without splits.
I was thinking the same about having a small size set sounded like we were close to magic number country and we know @dirvine dose not like magic numbers.
This is correct. All nodes being the same size allows us to closely judge network capacity, and when at 50% we add a node etc. This is easier as we only rely on elders agreeing they are 50% full. Previously we needed a reporting mechanism for adults that we trust less as they are younger.
This is a great question and goes quite deep down. I don’t think we know yet, but what we really need to be able to do is have the network tell us and dynamically change that over time or go for some guess and alter the magic number over time, like the bitcoin blocksize, but worse.
The problem here is having some nodes with terabytes of data. Say in the best case it takes hours or days to replicate all that. We have a big space for data loss if churn is pretty quick.
So a balance for sure.
Yes, we have come a long way though with BRB And AT2 etc. and we now dump a lot of work to clients. Here we are saying though a node previously may be 10 nodes now (or more), but an elder is just a single elder. If you see what I mean? So Adults are now tiny, you can have loads of them running. The work per adult is proportionately tiny and we are only asking elders to do that tiny part of storage.
There is another notion of having elders give clients the adult’s IP:port to retrieve data themselves, thereby making elders do even less work. It’s a long discussion but comes down to elders already knowing, so bad guys will know. We need onion routing or similar to secure all the ip’s of all nodes and clients. There is also the issue of bad adults being tracked by elders etc. So a lot to discuss, that will be a good one for this forum to tear apart, but simplifies even more yet again.
Yes, but right now elders are in the path for store/get and for a while replication. The failures with churn have been for many reasons, mostly adults actually not being connectable. We have a fix for that though.
I still hate them