MaidSafe Dev Update - May 25, 2017

maidsafe · May 25, 2017, 8:53pm

Today, we are publishing different proposals related to the first part of data chains:

Please note that we have not concluded on any of these proposals. They are all still being considered.

Also, we’d like to share an idea that we are considering for the next testnet with MutableData. Currently, every time we start a new testnet, all user data is lost and people need to create a new account and start from scratch (e.g. you need to reupload your data and recreate your SAFE websites). The real solution will be to implement data republish and network restarts via data chains, but since we are currently running client-only networks (where MaidSafe manages all the vaults), we are thinking that it could be a good idea to implement a temporary solution to make user data more persistent. So if a network is taken offline because we are adding new APIs and vaults need to be updated, we wouldn’t have to lose all the user data. This means that app developers wouldn’t have to reupload all their apps every time we start a new testnet . It also means that we’d have to start caring more about backward compatibility (e.g. deprecating APIs, migrating from one version of the API to another version, etc.).

There are a few simple approaches to achieve this, while none of them would be the ideal long term solution, they should let us get the client tests going and allow network updates without much of a hassle in terms of data loss during restarts. Most approaches involve allowing nodes to retain their previous ID and restart to the same area (address space) of the network. In the long term with node ageing, while nodes will still try and do the same, they will get relocated again at this point. For now however, we’re considering letting nodes just remain where they were which will then get their chunk stores relevant again and data available to continue operations. It involves keeping the chunk store persistent and also flushing MaidManager accounts which are currently not flushed to disk. Doing the last two parts could be achieved again in multiple ways such as using a custom network RPC to trigger and create a named backup which can later be restored to or go for a simpler option of just having the single backup by not dropping chunk_store on termination/startup. Each approach comes with its own set of advantages and we’ve not finalised an approach just yet.

SAFE Authenticator & API

The safe_browser’s DOM APIs were further simplified for usage. Previously, for every API call, the auth token had to be passed as an argument to the API function. Now, we are internally mapping the handle with the corresponding client, so we can remove this unnecessary argument. APIs to free the handles are exposed as part of the DOM API. Also, the for-each function was not working consistently, but this is now fixed. A new version of the SAFE Browser with the updated APIs and fixes was built with the mock network and published for the devs to try. @bochaco also documented the DOM APIs and we have hosted the same on GitHub Pages for easier reference.

The safe_app_java APIs are being wired up and a few test cases of ImmutableData are failing when run as a complete test suite. We have wired up most of the APIs and only the MutableData API is pending. We are hoping to resolve the issues and have the Java API ready as soon as possible.

@shankar is also getting the UX/UI of the authenticator improved based on the designs from @shona.

SAFE Client Libs & Crust

SAFE Client Libs have been updated to use the latest library versions (including Serde 1.0) and now we have replaced SHA-2 everywhere with a more efficient hashing algorithm, SHA-3. Some changes are being made to the dev branch of safe_client_libs to improve the public API consistency further: now we have a function to generate cryptographic nonces (generate_nonce), which makes the crypto module available to apps more complete. Further, working closely with the Authenticator team, we have closed the gap by fixing inconsistencies in some of the modules: for example, mdata_info didn’t have a proper function to deallocate it, and now there is mdata_info_free.

After updating safe_client_libs master to use the latest version of the serialization library Serde, we ran into some issues with the existing invitation-based account creation scheme. It was causing some weird out-of-memory errors, so we have changed the stored account information to be more strongly defined and put it into a common module in routing that soon will be used in safe_vault too.

The stress test from the master branch of safe_client_libs has been ported to the dev branch to include support of MutableData. This will help us to test the robustness and performance of the MutableData implementation in safe_vault. Then, based on the results, we can see where improvements are needed.

Routing & Vault

While the design and simulation efforts are still ongoing, we decided to make public some of the documents and proposals today and invite you to participate in the discussion:

Why data chains?

There are a lot of decisions the network needs to make as a whole: Which nodes are allowed to join, and in which section? Which nodes are responsible for any given part of the namespace, and in particular for which data chunks? What chunks are currently stored on the network and what is their content?

Eventually, after having exchanged a few messages, the nodes always need to reach an agreement on all of those questions, so that they don’t send contradictory responses to clients, and so that they agree on which sets of nodes are responsible to make the next bunch of decisions. In other words, both the data itself and the information about the network’s own structure are supposed to be eventually consistent: the nodes must always, although possibly after some delay, reach a consensus on them.

The goal of data chains is to be a historical log and a cryptographically verifiable representation of that consensus, that can be persisted and exchanged as a proof among nodes.

As a first step, we are only aiming to include the information about the network’s own structure (which nodes are members of the network, what is their name, and which sections and groups are they in). Even without data, that will already have considerable benefits:

With a more well-defined notion of a proof that a section consists of a particular list of members, we can at last enable the final piece of the puzzle for securely relaying messages: Each message will contain the required blocks that allow the recipient to validate an uninterrupted chain of signatures back to the sender.
Clients and nodes joining the network will be able to verify that they are connected to the real network without having to trust their proxy node.

Events in the network

Since the Disjoint Sections RFC, the network is subdivided into sections defined by a prefix, a finite sequence of ones and zeros. The section with a given prefix consists of exactly those nodes whose names in binary begin with that prefix. At first, the network has only one section. Once enough nodes have joined, it splits into sections 0 and 1. When section 1 has enough nodes, it splits into 10 and 11, and later into 100, 101, 110 and 111, and so on. If a section loses too many nodes, it merges, effectively undoing a split.

Nodes that join the network are assigned a small interval, and must pick a name within that. They are then challenged by the section they are about to join, to prove that they satisfy some minimum bandwidth (and later CPU) requirements.

So the main events that are represented by the blocks of the chain are:

A node has passed the challenge and joined the network.
A node has disconnected and left the network.
A section has split into two.
Two sections have merged into one.

The problem is that the existing nodes in the network don’t always agree on which of these events have happened yet or should happen:

A joining node can pass the challenges it got from different section members at different points in time. And it could even only pass some of them but not all.
A node could leave because of connectivity problems, and disconnect from its peers one by one, with several minutes in between.
A node could believe that its section should not split because it will soon have to merge with another section that lost too many nodes. Another node could have received an update on that other section earlier or later, and thus come to a different conclusion.
And similarly for merges: I might think that we, section 01, need to merge with section 00, because 00 has lost too many nodes, but you might already know that 00 has in the meantime regained a node and we don’t need to merge at all.

So our data chain implementation will need to start out from those different subjective views of the network, before arriving at an agreed “truth”, usually by making the nodes vote: “I think this node should join our section.”, “I think that node has disconnected.”, “We should split!”, etc.

Different approaches to a solution

Initially, most of the team’s time went into the Data Chains - Option A plan: We are trying to adapt established consensus algorithms like PBFT to our use case. Each section of the network would run an instance of that algorithm, and special care needs to be taken whenever the section changes, because a node joined or left or the section splits into two, or merges with another one.

Whenever an event happens, e.g. a node goes offline, the other nodes exchange several rounds of messages to reach an agreement that this event should define the next block in the section’s chain, and then commit that block. Apart from merges and splits, where blocks have two successors or predecessors, this produces a serially ordered history of each section: for any two blocks in the same section, it is well-defined which of them was agreed first. That makes this approach relatively easy to reason about, although the consensus algorithms themselves are rather complex. Other drawbacks are:

If an attacker controls a third of a section’s nodes, they can break the section. (This is a general result that affects a large class of consensus algorithms.)
PBFT is leader-based: One node in the section plays a special role, and attacking that node (e.g. DoS) can delay progress in the section until it has decided on a new leader — which could immediately be attacked again!

We looked into Honey Badger as an alternative leaderless consensus algorithm, but it looks like it cannot be adapted to our use case. We are continuing to evaluate other consensus algorithms and search for better fits.

However, these drawbacks led to us taking the leaderless patterns further and considering approaches that avoid doing a complex consensus algorithm before committing. That means they don’t necessarily always reach consensus on which event happens “first”: e.g. there might remain an ambiguity (decision branch) about whether node A left first or node B left first. Nodes can process and action events concurrently and eventually reach a single state again on which they agree (both nodes A and B have left), that might be what we need from them to achieve our requirements here.

Data Chains - General rules in an eventually consistent network is a discussion of implicit ordering, and explores a set of rules to reorder a vector of blocks to account for the fact that different nodes could have received the blocks (or rather the votes for them) in a different order.

Data Chains - Option B proposes a mechanism that computes the validity of blocks from a set of votes that is independent of the votes’ order.

anon40790172 · May 25, 2017, 8:54pm

I’m the first!! Will read now

Secretariat415 · May 25, 2017, 8:54pm

Thanks so much for the update!

Jabba · May 25, 2017, 9:03pm

Very interesting update but at how much reading you’ve given us!

The update itself has enough food for thought, then I took a look at Option A… gulp. I’'m going to be up for hours and a lot of it is going to go over my head

anon40790172 · May 25, 2017, 9:03pm

This is great! Will read all about the datachains tomorrow when my brain is rebooted after sleep ;-). Thanks for the update as always.

Seneca · May 25, 2017, 9:12pm

I just want to note that app developers would also need to care more about backward compatibility in terms of handling their app data, especially apps that read data generated by other users. Don’t get me wrong, data republish is incredibly cool, but I’m not sure if it’s convenient to always have it in these early test nets. I think sometimes a clean slate might be desirable in a distributed data network in test phase?

If that makes sense, I suggest we’d use data republish for minor network updates but occasionally have a wipe on major upgrades.[quote=“maidsafe, post:1, topic:13843”]
@bochaco also documented the DOM APIs and we have hosted the same on GitHub Pages for easier reference.
[/quote]

BIGbtc · May 25, 2017, 9:27pm

Timing. This.

Nice work. I hope this plan works.

whiteoutmashups · May 25, 2017, 9:31pm

#Live Q&A here! Ask questions, etc

19eddyjohn75 · May 25, 2017, 9:34pm

Thanks Maidsafe devs for another great update, but I’ll be honest this is way above my head. I hope that all these dev updates get compiled in the future when the SAFE Network is up and running, this would be a best seller on the SAFE network

Keep up the good work super ants

Nigel · May 25, 2017, 10:12pm

Can you say, information porn!?

anon81773980 · May 26, 2017, 3:00am

Thanks for the massive update.

My new homework for the next month.

Southside · May 26, 2017, 3:54am

Thank you all for this amazing update.
Fascinating to see these thought processes at work and the clear simple language used to explain them.
Like @19eddyjohn75 says, this document and its evolution will make a best seller one day

Well done to everybody involved.

Mindphreaker · May 26, 2017, 6:54am

I’m with @Seneca. I’d also 1) prefer a clean slate between testnets and 2) no more temporary solutions. Instead I would suggest focussing all free ressources on bringing MD live and documenting it. The current state with mock routing isnt really working well in my experience.

Beside that, great update on data chains! Will go through it over the weekend.

whiteoutmashups · May 26, 2017, 11:35am

This does sound like a good philosophy.

Just full steam ahead into the real solutions

Resources (while growing) are still finite at the end of the day.

My 2¢

Viv · May 26, 2017, 11:40am

Just to add a different perspective btw, client only testnet by itself is a temporary solution. While I dont think anyone would vote to move resources working on MD to a temporary data persistence solution, generalising things can also basically stop progress for extended periods which “may” not always be good.

I think everyone feels that way and isnt something to just remove forever but more allow bugfixes/ API extensions to not come at the cost of starting from scratch.

Again no one is actually working on this right now. It is very much an idea that needs evaluated against time required when there is an idea cos there isnt even one that is flushed out right now cos everyone is working on MD right now anyways.

Josh · May 26, 2017, 11:41am

I don’t understand much of all the data chains reading I just did but that’s fine I’m sure I’m not alone
Is option A much more complex or simply more wordy? Does it make sense to go with the less complex option?

Could somebody who understands what they read shed some light please. (Key differences and benifits)
I feel like I fell off a cliff and went splat

Viv · May 26, 2017, 11:46am

Option A had a lot more time spent into it so has a lot more details already flushed for it. Option B has simulations and changes being worked on right now so its likely to see a lot more tweaks(doesnt mean the other one isnt likely to change ofc).

Overall though complex nature comes from various directions, what Option A aims to provide will make reasoning about things easier. Other one has a lot more cases to consider which might then be considered complex too.

Certainly would “if” it holds through simulations cases and real world tests to be reliable.

Would be nice to take these discussions to the dev forum if possible so it doesnt get buried inside the dev update itself and new readers can hopefully catch the progression than hunt across threads.

BIGbtc · May 26, 2017, 12:06pm

Completly agree keeping the update clean for newcomers.

Im not a dev. Would it not be counterproductive - to your and the dev team - to send the entire noob community over to your place (dev forum) with chips n’ beer to hound the crap out of you guys n’ gals?

What about us creating and mini-dev-forum in this community where topics like -

Data Chains - Option A64
Data Chains - General rules in an eventually consistent network26
Data Chains - Option B26

Are fleshed out, dumb questions asked and smarter community members take the lead and you guys jump in and contribute when things get woefully off-course.

anon20713104 · May 26, 2017, 12:13pm

I’m with @Mindphreaker and @Seneca.
Thanks for your hard work guys!

Josh · May 26, 2017, 12:18pm

I get where you are coming from but I think @Viv is right for any chance of following it all needs to be in one place.
In most cases you would be right but here I think it is so over most people’s heads very few will ask questions or comment it’s just too complicated.

Topic		Replies	Views
SAFE Datachain (work in progress) Features	43	7566	August 30, 2016
Data Chains documentation summary Features	9	2021	November 13, 2017
MaidSafe Dev Update - June 1, 2017 Updates	21	7160	June 7, 2017
MaidSafe Dev Update - April 13, 2017 Updates	32	7500	April 17, 2017
MaidSafe Dev Update - November 30, 2017 Updates	75	9510	December 9, 2017