PARSEC and 99% Fault Tolerance

zeroflaw · August 17, 2018, 3:12pm

I’ve been going through the RFC on node ageing, and any forum post I can find to try and understand how this magic (node ageing) protects the network. I’m so far not convinced this adds ‘definable’ security.

To summaries my findings, ‘behave correctly or we will disconnect your node, and you might get half your age back on reconnecting’. The restoring age by ‘Using the group members to request any data the node has to verify age’, seems inefficient, and if am an attacker with some big nodes (and I no longer care about my age), I could just keep restarting them to drain my groups resources.

A ‘normal’ person is going to find it difficult to keep a node online without disconnects, and with the maximum node age taking up to 60000 years. Will an attacker get an advantage from setting up infratructure / custom distributed vaults to appear always online to get a really high node age? if I get to node age 5 years of churn, my single node would make up 50% of the group age, no matter which group I am in.

Can someone please explain to me why, the network is not using its fundermental properties to protect its purpose. The data itself, which to me, seems a much more direct and powerful / simple way of achieving the result of ‘node ageing’

Data is evenly distirbuted. So influence is distributed.
Chucks being requested often, are copy. So more influence is created and distributed.
Nodes that are overloaded / high latency. Will fail to / be accused of not providing data requested. Network can reduce influence, while restoring a node to a health state of not being overloaded

Protection:

Restarting loads of empty nodes, will have zero effect on the network
Running multiple nodes from the same IP not a problem (IPv6 makes using unique IPs pointless)
Influence is not lost by disconnects (Can the other nodes holding copies of the data, form a computational challenge for the node to prove it still has its data?)
Groups are an average of the currently distributed influence, so no single node becomes super powerful.
51% attack means, you hold 51% of the data in the group, which no single node will have, as the data would be relocated. If you are a massive farm with many nodes and 51% of the data, which means you’ll also have all the copies of the highly requested data, you somehow have an incredible amount of bandwith too. Which should be ‘impossible’

The only problem I can think of is if I want to host an archive node (No highly active chucks stored here). The influence of the data store here will need a handicap, to reduce the influence. Which as the network ages, the handicap increases to reduce influence.

dirvine · August 17, 2018, 3:36pm

Each restart will cause you node work though and each time it will lose 50% of age (so exponential decay).

JPL · August 17, 2018, 5:52pm

A nice potted 300-400 word explanation of sybil resistance would be helpful for everyone. It’s beyond my capabilities unfortunately or I’d do it myself.

dirvine · August 17, 2018, 7:18pm

Yes, probably it would be good. I will try and make a simple stab in bullets here, to try and help.

Sybil Attack
" The Sybil attack in computer security is an attack wherein a reputation system is subverted by forging identities in peer-to-peer networks. It is named after the subject of the book Sybil , a case study of a woman diagnosed with dissociative identity disorder."

That is from Wikipedia, it is probably best known in a looser form.
“An attack where an adversary can establish enough identities in a network to overpower or outvote good nodes and thereby cause harm”

We take several steps here, as one on its own is not powerful enough. A consensus algorithm does not help. Some steps we take are

Only allow nodes to join the network, a when they have done more work than the network has to prove their worth (resource proof). and b When the network decides it requires more nodes.
Only allow a node to have decision-making abilities after it has proven itself worthy over a very long period.
Do not allow a node to choose where in the network it exists.
Move a nodes location in a deterministic but not easy to calculate time period.

These are 4 of the steps we take. I will try and say why now.
1a. This is simple it costs them more than us to join. It also ensures the node very likely has the minimum resources before we do much work on accepting it.
1b. This one is crucial. As the network requires more nodes it will increase the farming rate to encourage them, however, we wish to block mass joins as these are the way an adversary would attack, the adversary is likely a botnet or large collection of VMs etc. Forcing nodes to wait to join means that this is prevented, but also means the time period for all of an adversaries nodes to join is longer. This lengthened time means he has to spend money and time, but also crucially it also means his nodes will be diluted by other nodes also waiting to join. This is a royal PITA for a sybil type attacker.
2. This again means that the attacker nodes have a lot of work and proving to do before they get to a decision making state. In a very stable network, they may never get to that state and if its a botnet of users infected pcs, then it is extremely unlikely the nodes do get to a decision making state (Elder).
3. This stage means that the attacker’s nodes will also be diluted across the network and in doing so, do even more work.

So the mechanism is to make such an attack very expensive in terms of time and resources. This does balance with an economic model/angle as well. As the attacker’s resources are being used at a cost to him, they earn safecoin. If they earn enough then the attacker is like a bitcoin holder. Then do they wish to crash the network, Engineers, and scientists, would say yes, they could be vandals, but bitcoin has shown us (me) something very enlightening, Even bad guys play by the rules if you pay them enough.

In any case, the economic model would indicate the costs of such an attack will be high, but the reward in regular income will be high as well. Therefore the loss of killing the network could be doubly painful. That includes attacking to a level where you could print your own safecoin as they would fast become worthless as the attack became known.

Debate about that can rage as it does with bitcoin as many academics I know in the computer science and Engineering fields to this day say bitcoin’s model does not work. Who would have thunk it

zeroflaw · August 17, 2018, 8:59pm

I get that node ageing is an attempt at stopping sybil attacks, what I dislike is, super-nodes who provide large amounts of storage and fast bandwidth, are not respected by the network more than an average slow node. This to me looks like an imbalance, making an expensive attack, potentially cheaper.

Attackers would be looking for these small imbalances to get an edge. If it’s possible to be exploited, it will be. Bitcoin is simple to explain, if you have 51% of the hashing power, and you are able to sustain your hashing power, you gain control of the ledger.

The safenetwork has node ageing, which has zero metrics, zero consideration for value provided.
(A) node has participated for 1 week, stores 300mb of data, and providing a consistent 25 kb/s bandwidth.
(B) node has participated for 1 week, stores 50gb of data, and providing a consistent 3 Mb/s bandwidth

With node ageing they both have the same age! if they have both participated correctly with the network. However, (B) is using far more real world resources. Yes, this node will be rewarded with more safecoin. Let’s be honest an attacker would be aiming to spin up 50000 node (A) because its considerably cheaper.

Is this not important? all nodes are not equal, I haven’t seen anything in node ageing that takes this into consideration.

Put simply, if I can get 50000 VMs on a physical machine (with 50000 IPs), all the VMs are participating. Do the VMs have more influence over the network than the physical machine with 1 IP? if the answer is Yes, then node ageing isn’t the correct approach.

dirvine · August 17, 2018, 9:26pm

Yes this likely would earn more safecoin (the vms I mean) if that is what you mean? The relationship to node age is not an issue or cause of this. The network structure is. At the moment all nodes are treated more or less equal, that will change over time, I am sure.

Archive nodes (if you search the forum) are in fact what I suspect you are talking about for large amounts of older data.

Are you saying you would prefer a single large machine instead of many smaller machines?

node age metric → Has a node performed over exponentially large amounts of work as its age increases incrementally.
zero consideration for value provided → Has a node provided value for an exponentially increasing amount of work for each age increment.

Over time these two nodes will for sure be rewarded differently. At version 1 if the min requirement is satisfied by B then that is perfectly fine. It is not only bandwidth or storage space, it is also memory, on line time, cpu etc. etc.

It is very important that all nodes do not need to be equal, they will not be and will have many differing capabilities. The summation of all those capabilities must allow the nodes to perform the minimum asked of them and it is ok.

Can it improve, well, of course, it can, but just as your browser will not download a javascript code piece faster than another person’s browser who has only 1/10 of your disk space, we need to be sure we are calculating the correct parameters. Space cpu on line time bandwidth (up and down) and many more make up the resource requirements. As the network evolves the algorithms rewarding stuff the network can use will become more granular. At the start, if the network cannot use the extra gazzilion Gb a machine has then it will not use it and will not pay for not using it

Hopefully, that answers some of your concerns and allows you to see that this is an evolving network, as it should be.

zeroflaw · August 17, 2018, 9:29pm

I didn’t mean more safecoin, I mean more votes. Or more potential to gain control of a group.

dirvine · August 17, 2018, 9:31pm

Yes more machines would have more potential to gain control of a section.

[Edit] I should say that it is not necessarily so simple as the section may not accept nodes of the same IP range etc. This has been discussed quite a lot, but its a cat and mouse game there.

zeroflaw · August 17, 2018, 9:33pm

It’s not more machines, it’s the same machine, same physical hardware thats been partitioned. Why is this the case? Should the network protection mechanism not see theses as equal? As in the real world they are equal.

[Edit]
Using an IP in anyway as a method of protection, seems flawed from the start.

dirvine · August 17, 2018, 10:00pm

There is a miscommunication here, so try and help me out

Of course, it is more machines according to the real world (I mean looking from the network outside). At the moment our nodes are on digital ocean hardware, all 200 of them, but of these 200 I am not sure how many actual physical machines that is.

I see the argument but I am not sure I agree or even if it would be possible unless there was some sgx mechanism or cpuid etc. but all of those are likely not feasible to distinguish anything as all those can be faked easier than possibly IP spoofing.

Are you thinking of some mechanism where the network could tell the difference between a vm and a single physical computer? If so then you have racks of raspi etc. to consider. I am not aware of any mechanism that could help in that. Perhaps you could elaborate your thinking there a bit and we can dive deeper to see if there is anything we could use?

zeroflaw · August 17, 2018, 10:16pm

I disagree, they are 100% the same machine. The way this is heading, virtualisation is being rewarded by the network. This should not be the case. Large fast machines with high bandwidth are important for network performance.

I don’t see why we need to even care about the IP or try to detect if its a vm or physical hardware.

If nodes ping/trace each other, and thats recorded like triangulation based on time to respond. Would the network not be able to build an internal view of how all its participating nodes are distributed from each other? Hows XOR networking distributing data?

This means that if you slice a physical machine into 2 VMs. The network will see that they are very close together, (because in the real would there in the same place). Which means the same chuck of data won’t be stored on the 2 VMs. In this scenario the 2 VMs will be treated the same as the physical. Taking my suggestion of above of using held data as influence, they have exact same data. So virtualisation has no advantage over physical.

dirvine · August 17, 2018, 10:32pm

The problem there is you are asking the two nodes that are the “problem” to report each other. Othwersie you would need a different type of overlay to try and triangulate these times, however, it is very simple to self-throttle and for a machine to provide random but within acceptable limits to fake all of that as well.

This part of your argument is more subtle, the small machines being rewarded is actually a good thing in many ways. It allows distribution of resources in a cleaner way than centralising towards huge machines and bandwidth etc. Then people in poorer areas can participate.

On the other side of the debate, there is (and I agree here) the initial position of treat all machines as equal at launch. This I think is obviously wrong as some machines can do more and should be rewarded for that. Over time I see this happening (again search archive nodes on this forum) but not as a default as it could be hard. It is a very complex issue, what reward for cpu versus bandwidth verses uptime versus space verses … etc.

At the moment age takes all that into consideration and says “you are rewarded by this exact amount for the combination of all possible parameters that can be measured”.
Hopefully, that may make more sense?

zeroflaw · August 17, 2018, 10:50pm

Self throttling or delaying to deceive the network would be detectable. If a ‘bad’ node is asked by 12 nodes to acknowledge. And they record the time, and those 12 nodes are deemed to be ‘good actors with ageing’. If the bad node has randomly added different delays, you would know because the it wouldn’t correlate to the locations of the good nodes. If the the bad node uses a constant delay, then the 12 good nodes still see it in the same place.
With more nodes and this model of locations would become more accurate over time. It could make for an interesting live graph if it were public data on the network

dirvine · August 17, 2018, 10:54pm

It would indeed

Keep in mind this network scrubs IP addresses on hop 1 for security as well though. So the location of nodes is not easy to get hold of on purpose. I see what you are thinking though, but please do keep in mind the geographic security of the overlay and the addressing as it is important as well.

tfa · August 18, 2018, 10:16am

Not obvious for me!

Currently a vault must manage a data chunk if and only if it belongs to the group of the 8 vaults nearest to the address of the chunk, which means that all nodes are equal. I see only advantages to this solution:

already implemented
ensures in a simple way that all chunks are duplicated 8 times
better for decentralization because individual users are more likely to get their fair share of rewards
big data centers can still participate by launching N small vaults instead of one big vault, but they will be limited by the control that only one vault is allowed on the same LAN
better for security because if a big data center suddenly stops its N small vaults, the effect will be less dramatic compared to the loss of one big vault, because the chunks it manages are dispersed in the whole network space instead of being concentrated in one section

With all these elements, I don’t like the idea that the proportion of data managed by a vault will depend on its age.

dirvine · August 18, 2018, 11:01am

No, I don’t either specifically, but I do think more granularity and rewards systems will happen. Such as pay for the number of agreements in an Elder group etc. There may be more, I suspect there will. I agree with what we have now though is a fantastic start as it does take into account everything to reach a minimum level of resources required.

zeroflaw · August 18, 2018, 3:17pm

I can’t see how promoting virtualisation is a benefit to the network. I mean you could simply spin up 1 VM for every chuck. I still totally disagree with using IPs as a protection mechanism, what happens when IPv6 is mainstream?

[edit]
One big vault would never be issued more than one copy of the data chucks. As the network can easily work out not to use this same host to duplicate them. With N small vaults, the network isn’t smart enough to figure out they are the same host, which leads to a ‘chance’ all chucks are on the same host. Therefore small vaults are more dangerous than big vaults, data should be replicated to larger machines because it’s safer.

Not true, because a farm is just going to spin up 1 VM per chuck. because thats how they will get most reward. Individual users can’t be expected to partition their desktops into multiple vaults to get a fair share of rewards.

–
I could keep arguing, but I don’t think i’ll get anywhere, I’ll wait for beta, and test these theories for myself. If it becomes possible to trash the network, hopefully that will be enough noise for someone to take note.

[edit]
Why can’t the network just create a group per chuck if thats the aim of all this silly virtualisation?

Traktion · August 18, 2018, 4:07pm

If anyone/everyone can spin up lots of small vaults, it will become the commin/preferred approach for everyone, not just attackers. I suspect there will be a point of diminishing returns, a sort of saturation point.

Moreover, I understand rewards are granted to those who respond quickly to requests, in combination with a random/lottery element. If you cram a VM so full of vaults that it starts to respond more slowly than less congested boxes, it is going to get more costly to run the vaults. I am not certain whether node aging takes this into account, but I suspect it does.

Ultimately we don’t want to define what the best way to run vaults is; it should be emergent. If aging and safecoin rewards reflect the value they have provided to the network, the optimal approach should out itself.

zeroflaw · August 18, 2018, 4:22pm

So the safenetwork 51% attack equivalence is I own more IPv4 space than everyone else? So if the US military who hold over 200 Million IPs wanted to destroy the safe network, we don’t stand a chance. IPs should not be the main factor in determining decentralisation.

dirvine · August 18, 2018, 4:33pm

Luckily they are not. In fact, they currently have no place in the security but may have later, if spoof protection is done properly. Also ipv6 will just help that along.

Topic		Replies	Views
MaidSafe Dev Update - May 24, 2018 - Introducing PARSEC Updates	296	18878	March 27, 2019
I heard you guys are hashgraph like Beginners safecoin	6	1296	October 8, 2021
Pierre explains PARSEC Development video	90	6186	November 7, 2019
The devil's advocate Beginners	4	1121	June 7, 2018
Some thoughts on PARSEC Development	2	655	June 8, 2018

PARSEC and 99% Fault Tolerance

Related topics