So in the Safenetwork if I understand correctly when I use an app like twitter/facebook and I want to post or comment to a post will I have to pay for it? every comment/post is bytes of data after all, am I correct?
Storing data on the network does have a cost, but comment size data would be tiny so in some cases the user might pay in others the app might pay for their users, either way it would be very little in USD terms.
Hey @JPL small tweak to the primer, the AT2 section, I believe itās actually āAsynchronous Trustworthy Transfersā as opposed to āTrustedā. It shows as trustworthy in the papers and searching as such gives better results in search engines. Loving the primer, moving target but Iām really excited to hear it might be partially maintained by Maidsafe to supplement your hard work.
Good spot @Nigel - I seem to have been hedging my bets with a mixture of trusted and trustworthy elsewhere. Corrected now, and change should be live shortly. Let me know if you spot anything else - always good to have another pair of eyes on.
Does the CRDT-tree boost up the performance of loading ? Then, How much (Is it a level that users can feel?).
I think itās more about being able to perform more complex data operations such as moving a set of sub folders to a new parent folder and guaranteeing the change will be the same for all nodes - something even Google Drive canāt do properly yet according to the linked white paper. But since full network consensus will not be required then it should be quicker than the alternatives.
I donāt know squat about this but like to try to understand. Was reading this by Martin Kleppmann
and was curious what CRDT implementation Maidsafe went with, RGA, LSEQ, etc and whether they have experienced any interleaving or anomalies? Is size an issue with CRDTs as well? Or can they be compressed by say Self-Encryption to make sizes smaller? Just raised some (I think) interesting questions for me.
yes, that is a very good talk.
So far:
- LSeq has been implemented and integrated into SAFE (AT2) as Sequence, a replacement for AppendableData.
- Tree has been implemented, but not yet integrated with SAFE stack at all.
In general, I think size will become an issue and will require compression techniques, but that falls in the realm of optimization, and for now the focus is just on making things work together.
Iām not aware of any anomalies. For Tree Iāve been implementing many correctness tests, that are all passing. A present area of experimentation is how best to deal with filename (metadata) conflicts between replicas, something that is mentioned in passing in Klepmannās paper, but as always the devil is in the details.
@bochaco may have more to say regarding LSeq.
In current builds of SAFE (without Tree), directory data is stored in a FilesContainer, which is a sequence of json strings, each representing a complete snapshot of the directory structure (dirname, filenames, symlinks, metadata, etc). This means that for each GET or PUT (read or write), the entire directory structure must be serialized, transmitted, and stored. This is workable for very small tree structures, but not for larger ones. Imagine renaming a single file in a large filesystem and the entire tree must be re-written over the network, and the full serialization stored in a new version.
By contrast, the Tree structure is much more granular. Each node represents an inode in a filesystem, ie a single file, dir, or symlink entry. Changing any aspect of any node is a single move operation. So, very little data need be sent/merged to update state. With this building block, we aim to create 1) an efficient files api for safe apps, and 2) a FUSE filesystem that can be mounted locally and used (read/write or read-only) even by non SAFE apps.
In terms of users feeling the difference. Well, one could test this out. Try safe files put /
. (I donāt actually recommend this!) And (if) that succeeds, try safe files ls
on the files container. It will take a long time, as all the directory structure must be fetched, deserialized, then filtered. Similarly another write such as safe files rm <file>
would take a long time. With the tree (caching) filesystem, the ls
and the rm
would be local operations and would return almost instantly. The rm
would cause an async move operation to be sent to other replicas, which would be able to apply it quickly/easily.
With this model, there is still the drawback that the entire tree must be fetched the first time. In other words, one cannot fetch/mount a single sub-directory without fetching its ancestor directories. Actually, Iām not sure if anything prevents this in theory, but would at least require more research. Until then, I imagine that users would opt for kind of a happy-medium in terms of max size of directory trees that they PUT.
This is really exciting!
Definitely - Iām looking forward to SAFENetwork being an extension to my file structure. Mounting it on my linux laptop and using it seamlessly would be awesome. If certain directories could be tagged to a URL, publishing content would become so natural too!
Great explanation of the pros (many) and cons (few) - thanks @danda
Will there be possibilities for adapting the way the tree structure is implemented for different use cases?
Was just trying to think through possibilities for pared down versions where maybe itās important that the first download is relatively small, but the metadata is not so much needed.
Was also wondering if maybe pointers to (mutable data) files could be stored as, say, random 2 byte numbers that are hashed along with the file name and the ownerās public name to get the address of the file.
Perhaps even with a couple layers of tree CRDTs arranged as nodes within the tree, there might be the possibility of pointers to millions of (mutable data) files for the price of only a couple of small downloads and a couple of hash operations (in addition to the performance benefits when cached of course.)
Trying to read up and get my head round all this, so please ignore if none of that makes any sense!
Hi @jpl,
Just a point from the Primer related to my post above that I found a bit confusing. Iām not sure itās that the Primer says anything confusing though, or just that Iāve managed to confuse myself!
The Primer speaks about the address on the network being derived from the content of the chunk.
As far as I can make out the xor address of immutable data is derived from the hash of the content, but this is not so for mutable data types:
āA (CAS) seems to assume the address is derived from the content itself, and in the case of the SAFE Network data this is true for ImmutableData XOR addresses, but not for MutableData XOR addresses,ā
This quote is from here:
Would this mean there is a level in-between the address and the raw data???
The only reason this distinction occurred to me is that if the addresses of mutable data can be derived separately from the content, this leaves a bit of room for potentially deriving them from hashes of other things, which could possibly be a useful tool (or not!)
I think you should assume this will be possible because it was with the old mutable data APIs and is a useful characteristic. For immutable data it is as you noted tied to the content, but this canāt be the case for mutable data, although it can be made deterministic based on some function (e.g. DNS names are hashed to determine where to look for their DNS container). For user defined data types, I believe it will be possible for you to derive the hash yourself based on anything you like, as this was the case in the old APIs.
Thanks Mark!
Makes sense. For the benefit of future readers, I think thereās a typo there - Iām assuming the second time round you mean mutable data -
Yes, I have left this as is for now as it is evolving and I didnāt quite understand it either! Iāll update it when I have a chance to get my head around it. Any pointers from the team or others welcome.
Yea all data that mutates has a fixed address not based on all of the content. There are possibilities though that in future we could base some mutable data on the hash of the owner + type or similar. Right now though only immutable (blob) has a name == hash of content.
Thanks @dirvine - how is the address of the mutable data decided?
You can choose your own name, but this might change (extend) as we get to launch. The data types are all being made crdt and possibly can change slightly, so not 100% fixed yet. However choosing a name is an easy approach for now.