Node Join Queue discussion and ideas around node membership

mav · April 12, 2021, 12:29am

Following on from the long and fragmented discussion in the Fleming Testnet Release topic (currently 32 posts containing the word ‘queue’ so I won’t link to all of them) about nodes joining the network.

This is a topic for brainstorming so please keep an open mind and try to respond to the strongest plausible interpretation of what someone says, not a weaker one that’s easier to criticize. Assume good faith.

Are new nodes put in a queue or are they simply rejected to try again later? Which of these is more desirable? How does each technique affect security, what mechanisms we can use (such as proof of resource), how should we prioritize the queue… a lot of discussion.

It’s a really interesting and important topic so I feel worth having a topic dedicated to it.

My main question for managing node membership: is there a difference between an extremely helpful operator vs an attacker? As far as I can tell they would both look the same to the network when they try to join.

Some quotes from the 32 posts that inspired this topic:

No queue , it’s just whoever is lucky enough to request to join at that time.
We could create a queue but it defeats the purpose of this anti Sybil attack measure - this is to stop someone with bad intentions from taking over the network with hundreds or thousands of nodes all joining in succession.

right now you have to spam join requests to get a chance to join.

Would a queue with some kind of resource intensive proof of ability be possible? letting the network pick the next strongest node.

we will implement a way to auto retry you if Network is not accepting, or something along those lines

Nodes will join a queue, stay connected and be selected at random (based on the hash of the lost or full node message, so ungamable)

This requires storage of the queue, and updating it if nodes go away, and handling if it gets too big. Might it be prone to spamming in order you get more nodes in the queue that someone else? I guess repeated join requests (with no queue) are also open to this kind of attack.
Even then, you get the issue of the queue itself being spammed with join-the-queue requests, just like the network so I’m not seeing how to solve this!

IMO, as a natural defense, clients should be expected to perform more work than whatever the network would perform. Ideally the work is useful to the network, however that’s not necessary to dissuade spam/flood attacks.
what’s to prevent an attacker from spinning up a bot farm to spam joins and either blowing out the queue’s storage or starving the network of capacity growth by being unable to find legitimate nodes to enlist?

if the random selection is the algorithm, what’s to stop a 3-letter from continuously signing up a bargeload of nodes all over the world and then degrading/corrupting output if/when they are selected? Conversely if only the highest quality nodes can be selected, what would prevent Amazon/Google from taking over?
What if every queue selection for node hosting required burning e.g. 10 safe to the network as an anti-spam measure?
…
How does someone earn in the first instance though?

As for useful resource_proof ideas, would checksumming a random data/block qualify? I’m thinking of it like a ‘free,’ externally performed and validated, continuous ZFS scrub. Using the scenario of a node joining the queue, the node is given some URL and a hash which can either be what’s expected or a rand() fake, the node performs a read of the URL, calculates the hash of its data, returns whether the hashes match, and if the answer is correct the node is entered to the queue.

A challenge set by one person in order to join, assigned to a random other also trying to join, which they must answer. By submitting two proofs, one of having set a question that’s answered correctly, and one of answering a question correctly you join the real queue.

What if you had a “proof of human” check for the first node under an “account” (or whatever you guys are calling the key combo) and then future ones are automatically added to the queue once the first node gets in and proves to be a good member?

Time is the great leveler, as its cost applies to everyone equily, so the length of time a node has been queuing - and providing full services to the network - would be considered a cost it has paid. This is effectively a proof of work exercise, but one thats similtanioulsy useful to the network.

PoW is centralising because those with more money/resources can dominate uptake.

Letting everyone in means everyone gets a chance to earn and I believe would make it unprofitable to try and swamp the network because: 1) rewards will be spread thin and it might make this unprofitable quite quickly, and 2) the only incentive is profit from rewards as you can’t easily take over a section this way. Actually, as I write both points I’m not sure they are true, however.

Those with more money/resources have always, and will always, dominate everything in the material world. They’ll operate the majority of the safes unless the selection algorithm prioritizes balanced clearnet address space distribution, and even that isn’t a guarantee.

If it’s trivial to starve the network of additional nodes, that’s worse than if it’s difficult/expensive. If spam/flooding isn’t managed, to say nothing of more sophisticated attacks, who runs the majority of safes will be irrelevant because the network will be unviable.

Toivo · April 12, 2021, 5:56am

Nodes themselves don’t have malicious intentions, it’s people behind the nodes that may. So a couple ideas to block or change that:

In the beginning participation is possible by invitation only. This way we could nurture the network until it is big enough to decide on it’s own. But how big it would be? (This may not be the best, not the most decentralized and not the most fair solution, but in the name of brainstorming I throw it here.)
Good PR. We want to emphasize, how the network is good for everyone, the law enforcement as well as criminals, the government as well as the rebels. Certainly those in power have the biggest capability to cripple the network, so we want to tell them, what are the benefits of the network for them.

Of course we would prefer solutions that are 100% resistant to malicious intentions. But until we find that I think there is value in targeting the malicious intentions themselves. In that targeting it could be worthwhile as a mental practice to stop calling them “malicious” and try to understand what is the good these intentions are trying to serve? What are they trying to protect? What are the fears driving them? Then we should emphasize with those fears.

If, for example, someone says: “I am worried how the criminals will do XYZ.” The answer is not. “They do it already on the clearnet.” The answer is: “Yes, I am worried about that too. It is scary. I think one way to mitigate that could be…” I mean that what is malicious intention from our point of view may be a good intention from the other point of view. We want to acknowledge the good that is there and align with that - not to attack the malicious surface the good is wrapped in. But maybe this is a topic of it’s own.

dirvine · April 12, 2021, 6:18am

There is no difference until we can detect bad or faulty behaviour. (just repeating for everyone)

I see where you are going and agree with your sentiment. We call nodes malicious when we mean

Faulty (Crash Fault Tolerance must work)
Byzantine (The fault tolerance is increased here)

So really we mean the software running is either faulty or in fact not the same software we think is running (so byzantine nodes are not running our code, but another set of instructions).

CFT is easier to handle, missing messages, not forwarding messages, not responding etc.
BFT is more difficult as nodes will do mostly the correct thing, but look for weaknesses (try and vote twice or vote to cause chaos and so on).

yippeeyo · April 12, 2021, 8:07am

All nodes should be able to participate based on following suggestions:

Nodes should have simple algorithm to determine their ‘quality’ (platform -physical or virtual, storage speed, age etc) - does this already exist?
Maidsafe should setup and manage high quality core nodes until network is fully autonomous
Nodes initially store only copies of data until node quality improves
Option: Node rewards are earned but not issued until quality (age weighted?) improves
Option: Node rewards are earned but issued in blocks (10 Safetokens?) - incentivize nodes staying online - reduce churn?
Option: Combination of above options - rewards are issued as percentage of earnings, remainder is held and issued in a block (5 Safetokens?) - get earning immediately and incentive to stay online

davidpbrown · April 12, 2021, 8:21am

Not sure what option there is for new nodes
but for nodes or the owners of nodes that proved useful - could those note a kudos… some signed key and balance, that allows them to draw on past contribution as reason they should be preferred. I don’t know how nodes are managed, if they are kicked and reason to not invite back this maybe becomes more complicated.

jlpell · April 12, 2021, 10:55am

No. Each node needs to pass a test on its own present merit. Otherwise a bad actor could add a few good nodes then follow it up with 100x bad ones.

I really think the only way to solve this is to make it very costly to be malicious. For that we need a detailed defifinition of malicious. Malicious from the networks perspective may also be an overzealous forum member or supporter with fomo who wants to spin up 10000 nodes with one click/script.

Proof of human (PoH) is the most straightforward solution. It doesn’t need to be Proof if Unique Human, (PoUH), which is overkill. So we’re really talking about a glorified Captcha that takes a few minutes (3 to 5, maybe 10, maybe 2, maybe varies based on load) of genuine human effort to complete and can’t be gamed by a bot/ai.

happybeing · April 12, 2021, 10:59am

As Jim pointed out, captchas can be gamed using Mechanical Turk services, but that just means the captcha has to be more costly to solve. The problem is it this cost makes it pointless allowing the feature in the first place, in which case it may be best to keep it simple.

jlpell · April 12, 2021, 11:01am

I used ‘Glorified Captcha’ as a generic catchall for a human task. It maybe be nothing like a traditional captcha.

happybeing · April 12, 2021, 11:02am

I’m using it the same way, so my comment stands.

jlpell · April 12, 2021, 11:08am

The point is that this method is the most costlty for an attacker, but not so for a normal node. At the most basic level it would charge human time to get in. One human per node, dedicated to a manual task for a random amount of time.

What is the alternative?

happybeing · April 12, 2021, 11:23am

I proposed something like this earlier which is included in the OP. So maybe you missed that in the original discussion, and Jim’s note about it being gameable with MT. So I’m pointing out that it’s not infallible, and that there’s a cost benefit to look into before knowing how effective it can be.

davidpbrown · April 12, 2021, 11:34am

Could you just accept all nodes but not engage them to the network - just freewheeling in a pool; so, it’s costly perhaps to spam but the network can then choose or queue them as and when needed. Spammers good or bad, would be discouraged from putting too many forward?

Toivo · April 12, 2021, 11:34am

On a general note it might be good idea to have different kind of protections for different phases of the network. Early on we might want solutions, that are not desirable in later stages. Maybe we could start to record some kind of trust score based on how people participate in these testnet iterations. For example I expect to get maybe 5-20 friends on board once we have a bit more user friendly testnet. And I know they are not going to do malice on purpose.

neo · April 12, 2021, 11:58am

So I want to run headerless nodes. Requiring human intervention is counter productive to that, and a bad actor can bypass that utilising services of others.

Whatever method is to be used I doubt it can involve human involvement.

person(s) with resources can bypass the need for (their) human involvement allowing for bulk attempts to add nodes.
causes problems for people with NAS docker images, SBC headerless setups, automatic recovery if node (eg power) goes off for too long.
and people in general do not like the answering (or picking images) since they are getting harder due to ease of bypassing them now. Some sites I have to do 3 to 5 (or more) image matches screens and they sometimes ask for things to be identified that just are not a thing in Australia. Like Fire Hydrants in Australia are in the ground with a cover, and in the USA they are above ground like in the old comics. That was an easy example but it gets worse where I’ve had to google the object asked to be identified. Easier to just identify the 3 images with common object when I don’t know what is being asked for.

I would resist a cop out method that requires human intervention and think a capability method would be better because it can also include a test to make sure the node is suitable. Obviously it should not be a energy waster or large PoW thing.

Invitations maybe reasonable in the early stages, but how does one prevent one invite being used multiple times without requiring the network to know which invites are currently being used.

Invites work good for account setup since the account resides as a data object but for nodes the data object is not so good. For instance I get an invite, setup my node and use the invite and wait to join. I finally get joined as a working node but node dies from whatever (power, setup, just testing) then I have to get a new invite. Anything else allows for gaming.

davidpbrown · April 12, 2021, 12:44pm

I wonder the OP is tempting an answer to the “how”, before a “what” is resolved. What nodes are wanted?.. then how to confirm those follows??

Brand new nodes need testing to prove what they are.

Above I suggested kudos to retain a sense of what was but rebuffed and Antifragile I wonder suggesting similar with the idea of invitation codes.

What is the test for a useful node?.. if that can only be proved “now”, for not trusting what went before, then surely needs to be random… if it can be preempted by making node busy in a null reward pool that proves they are not put off by the cost of time and energy hoop jumping - then that’s similar to fking captchas - which I hate and would prefer not to have any of.

If a long standing node drops, surely that would want to be welcomed back quickly. If we’re trying to ensure stability in the network early days, then the effect of a known node is better than an new node?

It is perhaps a matter of who owns the risk… should it be the network or should it be those who want to contribute?

Another approach is to put the node into debt; so, that it has to work off that commitment at risk. If there is no exchange of resource - money; or time; or energy, then all nodes are equal… and the network adopts the risk.

neo · April 12, 2021, 12:47pm

Well at this time it rejoins with age/2 or min age=5

It seems that is already built into the code

Is this already happening with Age and only getting rewards according to their age. (ie new nodes get less)

davidpbrown · April 12, 2021, 1:03pm

So, if the question is just about new nodes, perhaps it’s just a case of brute forcing the problem for long enough that the network is up and stable. Again, the metric of how many nodes are up and down over a period of time, might be key to suggest how stable the network is.

neo · April 12, 2021, 1:23pm

I did like the principle you were exploring of new nodes are effectively in debt. That prevents new nodes from benefiting (rewards wise) when added by a person with plenty of resources.

For say myself with 2 or 3 SBCs running on my home internet connection (hopefully also on starlink this year) the effect of a delay of getting accepted and rewards when I run up for the first time is an inconvenience, but for a bad actor or someone trying to get an unfair advantage with rewards it becomes a much more expensive exercise. Imagine someone trying to run up 10,000 nodes and they first have to wait an average of say 10 days to actually join, then 10 to 30 days before they age enough to be getting past their “debt” due to age.

Early on those times maybe shorter, but there is less rewards to be had anyhow since much smaller network (data wise).

Is Age enough, is it understandable by users?

Maybe a “debt” of say 100xaverage rewards would be easily understood and any reporting can show the “debt” amount in a way that doesn’t sound like you have to pay the network. I can imaging people wondering if they could just pay the “debt” to be accepted faster which defeats the purpose of it. The 100xaverage rewards would not be good using the testnet method since that is 100 splits before earning anything.

Or is Age a simpler way since the rule is you have to be above a certain age showing your usefulness. The 10 days to join and then say 10 to 20 days before earning anywhere near full rewards would have the same effect on trying to game the joining process for reward tokens.

For trying to take control then “debt” system using Age or SNT amount does not really mean anything since the attacker is not concerned with reward amounts. But at least it is much more expensive if they cannot offset costs with rewards gained early on. Then the question is if they are doing what they are supposed to do for month(s) then is it so bad to let them continue and kick them when they go bad? Anyone with resources can act like that many individuals and the only way to slow them is to make it expensive in cost and resources.

happybeing · April 12, 2021, 1:40pm

Maybe I’m missing something, but rather than using debt, can’t this be achieved using a minimum age threshold before a node qualifies for rewards? The dynamics will be slightly different, better perhaps, but I think the effect is similar.

dirvine · April 12, 2021, 1:50pm

Yes So age 4 - you are waiting, age 5 you are in and storing chunks, but rewarded in proportion to your age and on it goes. so as trust builds payments build.

I am sure we can improve, but this is a really simple and I think fair way to start.

Topic		Replies	Views
Update January 20, 2022 Updates	36	4173	February 5, 2022
Update 13 July, 2023 Updates	43	2151	September 27, 2023
Fleming-testnet - Error: Cannot start node due to error: Routing(TryJoinLater) Support	25	1052	June 9, 2021
Fleming Testnet v2 Release - * OFFLINE - V3 RELEASED * Releases	158	3915	April 15, 2021
Update 26 January, 2023 Updates	40	2160	February 1, 2023

Node Join Queue discussion and ideas around node membership

Related Topics