Safe Network Dev Update - August 27, 2020

Bayalu · August 30, 2020, 1:32am

So in the Safenetwork if I understand correctly when I use an app like twitter/facebook and I want to post or comment to a post will I have to pay for it? every comment/post is bytes of data after all, am I correct?

frostbyte · August 30, 2020, 2:40am

Storing data on the network does have a cost, but comment size data would be tiny so in some cases the user might pay in others the app might pay for their users, either way it would be very little in USD terms.

Nigel · August 30, 2020, 6:14am

Hey @JPL small tweak to the primer, the AT2 section, I believe it’s actually “Asynchronous Trustworthy Transfers” as opposed to “Trusted”. It shows as trustworthy in the papers and searching as such gives better results in search engines. Loving the primer, moving target but I’m really excited to hear it might be partially maintained by Maidsafe to supplement your hard work.

JPL · August 30, 2020, 10:55am

Good spot @Nigel - I seem to have been hedging my bets with a mixture of trusted and trustworthy elsewhere. Corrected now, and change should be live shortly. Let me know if you spot anything else - always good to have another pair of eyes on.

bridge · August 30, 2020, 2:03pm

Does the CRDT-tree boost up the performance of loading ? Then, How much (Is it a level that users can feel?).

JPL · August 30, 2020, 2:09pm

I think it’s more about being able to perform more complex data operations such as moving a set of sub folders to a new parent folder and guaranteeing the change will be the same for all nodes - something even Google Drive can’t do properly yet according to the linked white paper. But since full network consensus will not be required then it should be quicker than the alternatives.

Nigel · August 30, 2020, 6:07pm

I don’t know squat about this but like to try to understand. Was reading this by Martin Kleppmann

and was curious what CRDT implementation Maidsafe went with, RGA, LSEQ, etc and whether they have experienced any interleaving or anomalies? Is size an issue with CRDTs as well? Or can they be compressed by say Self-Encryption to make sizes smaller? Just raised some (I think) interesting questions for me.

danda · September 1, 2020, 5:53pm

yes, that is a very good talk.

So far:

LSeq has been implemented and integrated into SAFE (AT2) as Sequence, a replacement for AppendableData.
Tree has been implemented, but not yet integrated with SAFE stack at all.

In general, I think size will become an issue and will require compression techniques, but that falls in the realm of optimization, and for now the focus is just on making things work together.

I’m not aware of any anomalies. For Tree I’ve been implementing many correctness tests, that are all passing. A present area of experimentation is how best to deal with filename (metadata) conflicts between replicas, something that is mentioned in passing in Klepmann’s paper, but as always the devil is in the details.

@bochaco may have more to say regarding LSeq.

danda · September 1, 2020, 6:59pm

In current builds of SAFE (without Tree), directory data is stored in a FilesContainer, which is a sequence of json strings, each representing a complete snapshot of the directory structure (dirname, filenames, symlinks, metadata, etc). This means that for each GET or PUT (read or write), the entire directory structure must be serialized, transmitted, and stored. This is workable for very small tree structures, but not for larger ones. Imagine renaming a single file in a large filesystem and the entire tree must be re-written over the network, and the full serialization stored in a new version.

By contrast, the Tree structure is much more granular. Each node represents an inode in a filesystem, ie a single file, dir, or symlink entry. Changing any aspect of any node is a single move operation. So, very little data need be sent/merged to update state. With this building block, we aim to create 1) an efficient files api for safe apps, and 2) a FUSE filesystem that can be mounted locally and used (read/write or read-only) even by non SAFE apps.

In terms of users feeling the difference. Well, one could test this out. Try safe files put /. (I don’t actually recommend this!) And (if) that succeeds, try safe files ls on the files container. It will take a long time, as all the directory structure must be fetched, deserialized, then filtered. Similarly another write such as safe files rm <file> would take a long time. With the tree (caching) filesystem, the ls and the rm would be local operations and would return almost instantly. The rm would cause an async move operation to be sent to other replicas, which would be able to apply it quickly/easily.

With this model, there is still the drawback that the entire tree must be fetched the first time. In other words, one cannot fetch/mount a single sub-directory without fetching its ancestor directories. Actually, I’m not sure if anything prevents this in theory, but would at least require more research. Until then, I imagine that users would opt for kind of a happy-medium in terms of max size of directory trees that they PUT.

david-beinn · September 2, 2020, 9:17am

This is really exciting!

Traktion · September 2, 2020, 9:45am

Definitely - I’m looking forward to SAFENetwork being an extension to my file structure. Mounting it on my linux laptop and using it seamlessly would be awesome. If certain directories could be tagged to a URL, publishing content would become so natural too!

JPL · September 2, 2020, 9:55am

Great explanation of the pros (many) and cons (few) - thanks @danda

david-beinn · September 2, 2020, 10:49am

Will there be possibilities for adapting the way the tree structure is implemented for different use cases?

Was just trying to think through possibilities for pared down versions where maybe it’s important that the first download is relatively small, but the metadata is not so much needed.

Was also wondering if maybe pointers to (mutable data) files could be stored as, say, random 2 byte numbers that are hashed along with the file name and the owner’s public name to get the address of the file.

Perhaps even with a couple layers of tree CRDTs arranged as nodes within the tree, there might be the possibility of pointers to millions of (mutable data) files for the price of only a couple of small downloads and a couple of hash operations (in addition to the performance benefits when cached of course.)

Trying to read up and get my head round all this, so please ignore if none of that makes any sense!

david-beinn · September 2, 2020, 11:12am

Hi @jpl,

Just a point from the Primer related to my post above that I found a bit confusing. I’m not sure it’s that the Primer says anything confusing though, or just that I’ve managed to confuse myself!

The Primer speaks about the address on the network being derived from the content of the chunk.

As far as I can make out the xor address of immutable data is derived from the hash of the content, but this is not so for mutable data types:

“A (CAS) seems to assume the address is derived from the content itself, and in the case of the SAFE Network data this is true for ImmutableData XOR addresses, but not for MutableData XOR addresses,”

This quote is from here:

Would this mean there is a level in-between the address and the raw data???

The only reason this distinction occurred to me is that if the addresses of mutable data can be derived separately from the content, this leaves a bit of room for potentially deriving them from hashes of other things, which could possibly be a useful tool (or not!)

happybeing · September 2, 2020, 11:18am

I think you should assume this will be possible because it was with the old mutable data APIs and is a useful characteristic. For immutable data it is as you noted tied to the content, but this can’t be the case for mutable data, although it can be made deterministic based on some function (e.g. DNS names are hashed to determine where to look for their DNS container). For user defined data types, I believe it will be possible for you to derive the hash yourself based on anything you like, as this was the case in the old APIs.

david-beinn · September 2, 2020, 11:25am

Thanks Mark!

Makes sense. For the benefit of future readers, I think there’s a typo there - I’m assuming the second time round you mean mutable data -

JPL · September 2, 2020, 11:34am

Yes, I have left this as is for now as it is evolving and I didn’t quite understand it either! I’ll update it when I have a chance to get my head around it. Any pointers from the team or others welcome.

dirvine · September 2, 2020, 11:52am

Yea all data that mutates has a fixed address not based on all of the content. There are possibilities though that in future we could base some mutable data on the hash of the owner + type or similar. Right now though only immutable (blob) has a name == hash of content.

JPL · September 2, 2020, 11:57am

Thanks @dirvine - how is the address of the mutable data decided?

dirvine · September 2, 2020, 12:00pm

You can choose your own name, but this might change (extend) as we get to launch. The data types are all being made crdt and possibly can change slightly, so not 100% fixed yet. However choosing a name is an easy approach for now.

Topic		Replies	Views
SAFE Network Dev Update - August 20, 2020 Updates	79	3708	August 29, 2020
SAFE Network Dev Update - August 13, 2020 Updates	31	2405	August 24, 2020
Safe Network Dev Update - September 10, 2020 Updates	31	2929	September 30, 2020
Safe Network Dev Update - September 3, 2020 Updates	23	2877	September 30, 2020
Safe Network Dev Update - September 17, 2020 Updates	38	3210	September 30, 2020

Safe Network Dev Update - August 27, 2020

Related topics