Not sure that this is a priority at all though. Just chatter about possibilities. Certainly something that will be nice to have down the track, but I don’t see this being worked on right now. Am I wrong?
To clarify here.
Archive nodes require an additional layer on top of the XOR network. They would not all be equal and not all be evenly distributed. So there are an opt in type of node, separate from a normal node. This is also true of DAG or DBC Audit nodes.
The difficult part is creating that separate layer on kad to do that, then to allow distribution, publishing and finding of those nodes.
Libp2p has already done that work and they call these nodes “Service Providers”. They use gossips with is a secure gossip based pub/sub mechanism. This is resistant to attack of various kinds.
So when we say archive nodes, it’s only a libp2p service. That service will accept any valid data from anywhere and provide it on request. So a simple service really. The small complexity will come for the nodes ability to cover a subset of the address space. If it were huge it may cover the whole address space of small networks (testnets).
These provide
- Another data backup
- Remove less used data from nodes. (This is where they may be rewarded as nodes themselves pay a % of the store cost to store the data there).
Hope that is more clear. I had not realised you were thinking back to when archive nodes were a big deal as we would have to implement a service provider type mechaism. This is another area libp2p helps with a lot.
I was thinking about this fragment of the update:
Seems like it’s worked on, at least at a research stage.
Still, I don’t see an urgent need of archive nodes, to me it looks like an enhancement. Perhaps DBC Audit or DAG are a must? DBC Audit nodes are just like DBC mints? Does DAG = DBC Audit? DAG feels like a step in blockchain / data chain direction… Is this the reason why it needs to be a separate kind of node? A massive amount of data of transactions history, that not every node can store? And perhaps archive nodes are just another kind of these service providers, that we can get “free of charge”, once we already have these required DAG nodes? Or, just the archive functionality is already implemented by libp2p? To rely on the node covering % of the address space, there probably some proof needs to exist? That’s just a bunch of speculations, perhaps to motivate some discussion, explanations. Thanks!
Man, that’s a lot of questions. I will try
- Archive nodes are not urgent, but an easy good to have.
- DAG nodes really just hold SpentBook copies, so provide 2 main features.
1 Easy audit back to genesis
2 Secondary defence against doublespend attempts
None of these are essential for launch but will add an element of robustness and failsafes that could be extremely valuable should the worst happen. Testnets could also be improved with these i.e. continual data persistence and transaction history and so on.
Lets keep this firmly in the forefront of our thoughts.
What do you see as the major goals of the next few testnets?
Joshnet and Natnet were successful for different reasons, Natnet had a narrow focus but seemed very reliable within these limits, what do you want to prove/disprove next?
A few things IMO (each likely a testnet)
- Data replication model
- DBC transactions between people (with faucet)
- Pay for data
- Many nodes per machine (perhaps some node manager to start / stop many nodes)
- Dynamic upgrades
None essential
- NAT traversal (wanna wait on QUIC NAT traversal though, but if we must then TCP).
- Archive ndoes
- DAG nodes
The whole time we are looking at sybil defences and that could include using BLS keys for nodes. That means BLS in libp2p / Quinn and rutls.
Quinn and rustls are introducing pluggable crypto from the tls layer up, so we should eb able to easily configure that and possible PR libp2p to use that.
However I would like to find a much simpler defence and that includes many many nodes and more than 1 close group per data item. i.e. hold data and SpendBook at the XOR address plus the hash of the XOR address, making sybil very hard indeed. I think there are even simpler solutions there too
Any idea why at the moment the churn test on Nightly pattern always fails?
The guys are wrestling with a few things there. It’s not a single cause and it’s why you see a few PRs lined up to resolve those causes.
What’s the droplet size?
This is the commit that adds double spend detection, using a Vec<SignedSpend>
Even if there are many many double spends, only two of them are used / kept:
I wonder whether there’s any edges when a client attempts millions of double spends (instead of just 2), and how it affects this feature?
One of the problems I’m imagining is the comms and deserialization of a very large object causing a lot of load (if a node tries to send all the txs for a big double spend instead of just two txs).
Secondly, could there potentially be thrashing if the two spends held by each node are continuously changing when millions of double spend options are being relayed around the nodes? Nodes have the perception of many ‘new’ transactions even though they’ve already been seen (but not stored since that’s only 2).
I’m just being a devil and trying to break stuff; this feature can (and will) work as expected, but is a pretty juicy avenue for mischief (obvious for anything double spend related!). I think it’s ok as-is but will be interesting to try to break it.
A wallet that doesn’t correctly mark a dbc as spent (internally) could easily choose it again for the next spend. It’s an obvious thing to have to do but there are plenty of ways for wallet software to accidentally not mark dbcs as spent and then double spend them later.
Yeah it’d be kinda hilarious if double spend was a deliberate option included in the cli wallet, (highlighted with bright warnings about the consequences); have it there for anyone to mess around with, rather than a shady thing hacked together by those ‘nasty double spenders’.
There’s been some instances of the client not connecting in good time, that may be contributing.
We’re also actively assessing various tweaks to replication flows here too. The nightly test is one hour of constant churn (i believe we churn 100% of the network 6 times just now), so it’s a tough nut to crack, but a decent bar to be aiming for.
So my testing has shown essentially 10 nodes per 1vcpu/1gb of ram. For our current testnets im running 20 nodes on 2vcpu/2gb amd droplets.
Currently, we are working on implementing a RecordHeader
to encode the kind of data (chunk, dbc etc) that is being stored internally; Kademlia cannot distinguish between them as it stores everything as a Record
. This has enabled us to perform proper validations against a local copy of a dbc + incoming dbc etc., and then store a Vec
containing all the double spend attempts.
That is an interesting problem, and I think we might have to fall back to storing just 2 SignedSpends
, but ordered in some fashion such that the other spends are discarded.
With the ordering of the SingedSpends
and the validation against the local dbc copy
this should hopefully be solved.
Also, having a check to make sure that any incoming Vec<SignedSpend>.len() <= 2
should prevent an attacker from crafting a large Vec
to slowdown a node. Additionally, there is a built-in upper bound for the size of the Record
that is being transmitted, which is set to 1mb currently.
OK, so what has the “Check before merge” churn parameters been? And has it had random errors, or only ones traceable to PR under the test?
Just curious, because when comparing these two, I would expect the Nightly to pass by sheer luck at least sometimes, or the other one to fail more often. But yeah, I don’t know anything about the parameters, (or testing software in general… )
Nightly hasn’t been consistent yet, merge.yml
does one round of churn over 10 mins.
There’s some issue we have (some msgs are still being dropped atm), that we’re digging into, without them, the nightly run will be unreliable, sadly (and we could deactivate, but it’s still useful to see how/where things are progressing, even if it fails)
Thank you for the heavy work team MaidSafe! I add the translations in the first post
Privacy. Security. Freedom