Safe filesystem API for a FUSE implementation in Rust

dirvine · February 23, 2024, 9:50am

We don’t reference words though, but whole files. The Hello File is one Chunk and the World file is another. They cannot change.

I think this is where the confusion over links happens. Here a link is a whole chunk and you copy the whole thing. So the web site dir holds directly all links (chunks) it uses to address whatever data it needs.

If you imagine in SAFE to link to something , you copy that metadsata/chunk (it’s a zero cost to the network due to dedupe) and work away. It’s under your control in your Dir (register) as the web site owner.

happybeing · February 23, 2024, 9:57am

Yes, that’s what I’m referring to by change to the directory.

Back to the history…

Say you put your website under git which many do because you can get back any file, or look at previous versions of the site in its entirety, eg for rollback.

When you reach v23 and decide to git checkout v14 you will get the entire tree in the state it was at check in.

Your design can’t do that for the entire tree.

You can get every version of the images directory. You can get every version of the posts directory. But you don’t know which goes with which.

Unless you have a way to say this version of images goes with this version of posts, you can’t get a particular version of the website in its entirety.

dirvine · February 23, 2024, 10:14am

This is where we are crossing each other. It definitely can.

Ok mixing git in the picture like this could perhaps cause issues as it works on the old paradigm in our minds, but in actual fact it’s a design thing. Git has every thing as a blob and versions those blobs itself. So it will take any file you have at any point and blob it and keep a reference to the bytes of that file. In essence, it sucks the bytes out and put’s them in a git like filestem.

Here the blobs / files you used for the website would still get sucked into git like todays files do. Checkoing out would put them in normal traditional folders and not Registers as we have. Iff we look at it from different levels of course.

If we look at it from the FUSE type level where git writes/reads and underneath that registers provide the dir. Then git would work, BUT it’s using a totally different mechanism to version.

So we have one version system fighting another here, but if the SAFE fs was FUSE based then it would work.

This part is maybe confusing here and where there is a hole.

So you have this root_dir - points to images and posts dir and you create something from it. Let’s say a blog page.

In your blog page you select Entry ABC from the images dir and post DEF from the posts dir and embed them in this page.

This page now become a new post in the posts dir (PQR). So this is versioned in SAFE and works. You can check out any version of the post you want and it will point to the correct image and post that made it up.

I might be explaining this really badly, but I cannot see how this does not work and means never a broken link, never a broken FS link etc. They all work and version. The difference is here we have a dir per thing, so the blog post is one. website that has it’s own dir (list of pointers).

I might be just a bit woozy as I have a stinker of a cold, but I cannot see the reason for backwards traversal, unless we try and use the same FS history for different purposes.

digipl · February 23, 2024, 10:36am

futuretrack · February 23, 2024, 11:24am

Let me preface this by saying it’s entirely possible I have the wrong end of both sticks in this discussion

The gist I’m picking up, the issues are related to the expectations of a filesystem under an operating system v.s the raw implementation (ok that’s stating the obvious…).

There are a couple of scenarios:

You create and use the filesystem, and always have your own view of this filesystem. Anyone else using this filesystem (from a different location) will have their view, and never the twain shall meet.
Multiple people have the same view of the filesystem, and it becomes eventually self consistent under CRDT.

One possible implementation under discussion is (and I might have added a couple of bits from my own perspective). :

Hash “DNS” + Key → Root FS Register
Root FS Register → Root Dir Chunk (XOR for the datamap for a file which contains the formatted Directory Entries)
Root Dir Chunk has metadata entries like:
Type (d/f, etc), name, perms (0755), target (register address for a dir, or datamap XOR for a file)

This structure can then be nested (and I think this is where the natural conflict between an OS tree and a versioned f/s exists), so we get:

DNS Hash → Root Register → Root directory chunk

Root directory chunk (file) has:
File 1 Metadata → File 1 datamap
File 2 Metadata → File 2 datamap
Dir 1 Metadata → Dir 1 register → Dir 1 directory chunk (datamap of Dir 1 file)
Dir 2 Metadata → Dir 2 register → Dir 2 directory chunk

Dir 1 chunk has:
dir1 file A metadata
dir1 file b metadata
dir 1 dir A register

Ok, so it’s a tree (and apologies for the formatting)

Now, somebody modifies file B. Doesn’t change the name. It get’s a new datamap, and this gets updated in the chunk for dir 1, and dir 1’s register gets updated to point to the new dir 1 chunk. the root FS register or chunk don’t need to be updated.

You don’t want the new file B, you want the old one. Ok, you could iterate through the version history in the register to find out if this is your file or not, but let’s say your file is nested 3 directories deep, and somebody has changed a directory name (not a parent of your file) 2 directories back, and changed your file.

I think it would be difficult to reconcile the file tree as it was in order to get your filesystem structure. It wouldn’t happen automatically via CRDT, and so I’d guess it would need a constant review (re-download) of the FS structure + some kind of reconciliation algorithm.

The point here is that Dir 1 has a bunch of versions in the register, Dir A has another set of versions, A directory below Dir A has another set of versions, and it’s difficult to know which version goes with which without specific versioning / snapshotting. (ie. the parent of a parent won’t see the changes, as their register / directory doesn’t need to be updated)

So this comes down to expectations about what the filesystem is going to do - how it’s supposed to operate. Do I want a specific view of the filesystem as it was at a particular time, or am I happy for it to change around me, and potentially do some manual reconciliation to get the file I actually want?

That’s longer winded than I wanted it to be (I :think: that’s what @happybeing was driving at), but hoping that outlines the issues?

futuretrack · February 23, 2024, 11:37am

To my earlier point re: shared vs. individual (to a user) filesystems, I suspect the kind or problem described above could be eased by ensuring that the register isn’t writable by anyone other than the owner, and if you want to change the filesystem you didn’t create, you would need to create your own corresponding registers at that point, but this means (effectively) 2 separate filesystems with their own history, and so not shared by default - which might not be a bad thing?

dirvine · February 23, 2024, 11:42am

Aha, just read this and perhaps we are closer than I thought. This is exactly what I am talking about, BUT you don’t ask the posts register or the images register. They are in some other dir structure for another collection of data.

So we have our own structure for our web site. A new root_dir and collection if we want. But that’s not important.

Our register for this post can have a load of different files / images pointed to by each entry. As the page is edited a new entry happens with changes to the previous set of files and register links.

So the key is we don’t reference other registers for our collection. In our collection all our registers are under control of our root_dir. We can have a load of sub dirs and so on, but all referenced from a root_dir. In the case of ANY collection of stuff, then it’s a new register.

So each web page/blog post or software dev thing can have it’s own register set under it’s control.

AHHHH I think I see now

If we have a large FS like a software dev

So each directory in total in sync across the tree. I get it now. I still think there is more to it though, but let me poke around some more with left field stuff. I do think the zero cost (almost zero) links do hold some magic we may be missing here.

Sorry for the noise and hassle @happybeing

happybeing · February 23, 2024, 11:50am

I used git to illustrate what your design cannot do: version multiple directories together.

Same reason I tried explaining using a website as an example. Yes, as you explain the web page will load the correct images, no dispute about that. But it works because you are relying on the synchronisation of the history outside your design (in the page itself), so your design still cannot get the state of the whole tree at a point in its history.

That’s important for editing a website and will be a problem for any use case which wants the state of multiple directories at a given point. An app can be built to do that, but I’m talking about the API design and what it achieves.

You can’t claim that you can version the whole tree because a web page will load.

What you are claiming I think is that you can have versioning if the data you are storing handles that for you but not all data that needs to be correlated contains such links.

So my contention is that it is important for the API design to be able to version data in the tree regardless of whether the data itself contains links, and that your design doesn’t do that.

At least I now think we understand each other?

futuretrack · February 23, 2024, 12:08pm

So I think this:

Is the same as this:

hoping I’ve understood this - though I’ve been skimming a bit…

dirvine · February 23, 2024, 12:41pm

I think what I am saying is we don’t always need an entire tree. Web pages as an example.

However when we do want an entire tree in sync, then I do see the problem you are showing.

Directories are a weird thing, in HDD they made sense, in Git they were a pain and actually closer to what I am proposing here. The whole tree is in the git file. It creates its own with dir markers.

So it’s a close design to syncronised single folders, but held together with markers that gi uses. The files are not really in folder but blobs of bytes in a git tree if you like. There in git FUSE filesystems they abstract this out again to a filesystem with normal directories.

So git kinda treats the whole tree as a big huge blob of bytes and uses dir markers and metadata for dirs and filenames etc.

I feel SAFE with registers is close but not close enough. What you are doing is great and will do a. git like thing.

What I am wondering about is infinite size disk in FUSE which does not require we write all the way back to root.

I hope we keep poking hard at this one to come up with something that works in both cases.

futuretrack · February 23, 2024, 1:11pm

There are a couple of interesting questions in this (and I have an old-school infrastructure hat on here…)

In terms of infinite storage, in terms of @happybeing’s approach, this could be accommodated by placing a reference to another register as the very first entry in the metadata file. When the register runs out of space, you would traverse to the next register at the top of the version of the metadata file you’re interested in, and before uploading your new metadata file, create a new register, and put it as the first entry in that.

It makes traversal a bit more complicated (i.e. you need to check your tree of metadata files to find the last non-empty register), but could scale infinitely.

re: writing back to root, it could end up as a costly exercise, depending on how the filesystem is used. One possibility could be to hang files off registers too, then only update the root file if the file metadata substantially changes (e.g name change, or file move). However I’m less convinced by that one as I think size metadata might be something you want to save in the root file for performance reasons - but maybe worth considering.

neo · February 23, 2024, 11:39pm

I wonder if we need to go back to basics here.

What is it we are trying to achieve for the users of the network.

My thoughts are

a system to work though the public files for the benefit of all safe users
a system that allows a user to mount their (semi)private personal filesystem.

Option 1 seems to be the hardest to define because how is it all arranged, everyone has their desired way to access the files. Like videos/movies, documents, etc. Maybe a search system using smarts to assist in finding the desired files. Perhaps more like web pages

Option 2 I think this could be more along the lines of “attached storage” or even as a personal disk drive.

Option 1 would seem to addressed along the lines of assisted searching. The searching would get to the “storage system” for that group of files. IE no need for infinite back to a network root.

Option 2 being more along the lines of personal or NAS style of storage also doesn’t need to have an network style of infinite storage.

The point is how will the individual users see such a storage system. I doubt we want a god like storage system that has the one network wide root.

happybeing · February 24, 2024, 9:11am

I agree that’s a useful discussion but also off topic. Could you start a topic for that so we can keep this to the filesystem?

[I think Rob’s gone to bed, @moderators can you clean up?]

Knosis · February 25, 2024, 4:28am

Curious about integrating new tech with Safe Network? Check out Stargate V2. It’s a tool for API management that could offer options that enhancing the network’s capabilities, especially around filesystem APIs and FUSE implementations. Could this be an opportunity to make SN data storage and retrieval methods open to a broader audience? Thoughts? Stargate V2 Overview

Stargate V2 is an open-source API layer designed to work with databases like Apache Cassandra, breaking away from a monolithic architecture to offer a more flexible, microservices-based approach. It aims to improve scalability, developer experience, and cloud compatibility by allowing each API (REST, GraphQL, Docs) to be independently scaled and managed. It facilitates easier integration of new APIs and promotes efficient, modern application development with less overhead.

happybeing · February 25, 2024, 5:13pm

Topic Review

The purpose of this topic has been to discuss potential filesystem implementations that can support a locally mounted FUSE drive, and to try and develop and underlying standard API for filesystem storage on Safe Network that encourages use of a universally understood API for applications storing data.

This is tricky because of:

the ambitions of Safe Network (particularly versioning and concurrent editing)
the need to define what those ambitions mean in this context, including functionality, APIs and user experience
the challenge of implementing filesystem conventions (e.g. POSIX style features) using the available Safe Network datatypes

This is compounded because much remains undefined. So this topic has been an attempt to clarify these areas and try to arrive at a useful way to proceed, and ideally the outline specification for a FUSE based filesystem, associated APIs for Safe Network for this and other applications.

Topic Status

We haven’t yet decided on a suitable set of capabilities. I attempted to define an ideal, but in discussion, some areas (e.g. versioning) mean different things to different people which means we don’t have a clear target.

Ideal Feature-set

Here’s my earlier summary which leaves concurrent editing to one side:

In the above, versioned refers to the whole filesystem though it is apparent that is both challenging and not necessarily seen as a feature for the Safe Network APIs, but perhaps something for the application level.

Personally I believed that versioning should be across whole datasets to ensure coherence within the history, but it isn’t clear that this will be feasible in a filesystem using MaidSafe’s preferred approach or Register APIs. Instead it may become an add-on in the application layer, much like taking a snapshot or backup is now, rather than something that “just happens” because you are storing data on Safe Network.

Implementation Difficulties

We’ve discussed two attempts to use the expected pointer-only Register API to implement a filesystem. One attempts versioning of the whole tree, and another versioning of each directory independently both of which strengths and limitations.

Versioning of the whole tree makes sense for a wider range of applications, where a history of multiple documents across multiple directories can be correlated together and backups or snapshots of the filesystem are an inherent part of the Safe Network filesystem APIs rather than requiring separate application logic. However, without a suitable TreeCRDT it would be hard to handle conflicts due to concurrent edits, than with directory-only versioning.

Using independent versioning for each directory would make it easier to merge conflicts and to support review and modification of the results of merging within a single directory. It is not clear that merging of hard-links would be feasible with this design, which would probably require a specialised form of CRDT (such as a successor to the TreeCRDT as suggested by @danda). It would also mean that features such as backup or check-pointing of a directory tree (or the whole drive) would be an application level function requiring storage of the state of the entire directory tree (but excluding the files themselves). So versioning of file trees becomes would become a user/application level issue as with contemporary filesystems, rather than an inherent feature of a Safe Network filesystem API (which was my aim).

Register API

I’m still concerned that there may be unanticipated problems with the pointer-only Register design, and also if use of registers prevents the ability to create and access the full structure of the node tree. The latter would for example prevent implementation of the whole tree versioning design referred to above, and perhaps other creative ways of using the only mutable data type available at least for now.

What now?

There are lots of uncertainties here, both in what is useful to aim for and how that might be realised in a Safe filesystem for a FUSE mountable drive.

I’m sure that a mountable drive can be implemented whatever route is taken, but it seems too early to proceed further on design or implementation with so many things being unclear. So I’m leaving this outline of the areas that I think need to be clarified.

It concerns me that we don’t have a definition of what we are aiming for within the Safe APIs, or what will be feasible based on examples of how to deliver it.

So I think it will be valuable to have concrete reference designs available for key use cases (such as versioned websites and mountable drive), to demonstrate the suitability of proposed Register and filesystem APIs. And from those, a clear understanding of what will and won’t be possible from both a user perspective and from a developer perspective.

With regard to perpetual versioned public data (such as websites, document collections, libraries and archives) I’m concerned in case simplifying the Register APIs in fact pushes too much responsibility in to the application logic (for publishing and viewing). I think if the Safe Network supports those functions implicitly it is much more likely to be successful in delivering perpetual versioned public data across multiple applications and data types, which seems important.

Finally

Even if there were complete clarity it would be a lot of work to tackle the filesystem APIs and implement a FUSE filesystem. Even moreso with versioning and merging concurrent edits. But with much uncertainty I’m going to pause rather than proceed with a design or continue trying to clarify (because it is consuming and arduous).

So I’m going to return to Solid related things for now. I’m more confident that LDP or indeed a graph-store can be implemented on top of whatever filesystem or APIs are available, and that versioning can be handled in a LinkedData API layer (even if it isn’t implicit in structured/trees built with Register APIs). I’m less sure about concurrent edits which may also rest on an improved TreeCRDT but can proceed without that for quite a while.

dirvine · February 25, 2024, 5:50pm

Superb write up and summation of where we are @happybeing Many thanks for this

I totally agree. I just want to make 2 main points.

The team have just not focussed here at all in the register API, it’s very close to a have to think about, but recently it’s been very much stable testnet. @bochaco and @Anselme have dipped into registers, but not use cases as much, as of yet. It will happen and this discussion is really useful. So don’t feel this is wasted or you are on your own. We MUST get an API that suits as many use cases as possible and it MUST be default that SAFE versions all files and the versions and files are perpetual.
This is where I am controversial and a pain, but these are thoughts for a future and hopefully recognising the tension of change

I have long wondered about directory structures and the old ways. I am not sure it’s good enough for the future. I don know folk will cling on to it for dear life and some old apps need it to keep working. I am kinda tongue in cheek here, but I feel reality is likely that folk filing files in directories in a tree of all their data is very much old school. I am not sure it’s the future and I will try and clarify

One of the most used computer platforms now is mobile, phones, tablets and so on. These do not present users with a directory structure as a default position. Many apps and the simplest to use apps do not show that kind of structure.
Labels are more likely to be useful, where you can equate one of more labels as you can a file linked from 2 separate locations.
Old style directories do not scale well at all. Even the most prudent of folk will have a mess of stuff all over the place.
There is a situation where a data element can be represented via a register like structure and kept neat and tidy, with all the features we want. note Here we forbid subdirectories otherwise we face the version tree hell situation
Top level directories can be considered more collection of stuff, i.e. Music, Docs etc. These need not be tires of a root, they can be collections.
I know some will find this contentious, but LLM advances means directories are even less useful and possibly files as well.
It may be possible to replicate/imitate a tree like structure using registers and labels. It would not be unlimited and while we do not know the limit, we must also note current modern tree like filesystem are all limited.

These thoughts are just brainstorming, but potentially important. I hope NOT to present an opinionated approach here, I am just wary of forcing the network to force the old ways as the only way, preventing much of the above.

Toivo · February 25, 2024, 6:42pm

It’s easy, when thinking how new tech will affect the old one, to not see how old tech affects the new one, keeps creeping in with it’s limitations. I’m glad you are awake here.

happybeing · February 25, 2024, 6:59pm

Thanks David. I understand where you are coming from and have no problem with innovation that aims to serve users rather than capture and exploit them.

Unfortunately, right now, the most open system we have for data is the traditional filesystem. It provides a standard well understood and still very capable paradigm for both users and developers, and with the exception of big tech cloud silos and the various centralised services, is a ubiquitous and universal basis for application data storage on all common device types including mobile.

Aside: I was shocked and yet not surprised to learn that those cloud silos are already forcing lock-in of their service users by making extraction of their own data a prohibitive cost, making migration to rival services unattractive, including of course Safe Network

I don’t believe providing something that people already understand and can use most easily right now - with the applications and data management styles they currently use - is exclusive or inhibiting of innovative ways of storing and managing data.

I think giving the maximum number of people and developers the ability to understand and use Safe Network quickly and easily is both helpful to making it a success, and a way to offer more people those innovative alternatives once they are comfortable with and trust the network with their data, privacy and security.

That’s why I push for a FUSE POSIX style filesystem as an early option. What comes alongside is going to be very interesting, but must compete on merit rather than be forced, because that creates an unhelpful barrier to adoption.

With a FUSE mounted drive, from day one people will know how to copy their files to Safe, create, save, edit, view, browse, search and many other things without having to learn anything beyond starting a new application.

dirvine · February 25, 2024, 7:13pm

Thanks @happybeing

I suspect the back end data storage paradigm is not relevant here. These companies would silo any data structure.

Serious question, but is it?
Mobiles, tablets, AWS etc. don’t use directory trees. Neither does any of the big data schemes AFAIK. Again I am pointing this out not debating cause they do we should etc.

Totally agreed on this point.

I think this is the crux. Take this or the points I raised above. Should we force any of these schemes or just make them all available at a higher level (app)?

The issue though is a core of SAFE, the perpetual data, to me that means every version of every file.

So forcing any of the above, if it were possible to force them, could achieve that goal, but to the detriment of other schemes.

I think that is more the debate as opposed to which data representation we want. I say this knowing as we all do, big data does not use dirs, so we could be disqualifying those if we force a tree structure?

The key may be offering what we can in the most simple fashion we can whilst maintaining the core fundamental. What I mean is default API could be several types

Tree structure
Labelled data
Containers
…

Saying ANY of the default API does hold for the core fundamental of perpetual data. Folk may be able to go low level and subvert that, but I think that is OK as we can say, the default allow all types of data container and they are all perpetual.

Then are these core or a higher layer, maybe above the network API but below an app? i.e. we have a low level network API, a high level API (recommended to hold the fundamentals) and then apps above that.

If folk create bad apps then the market should speak.

happybeing · February 25, 2024, 7:55pm

I believe the filesystem is the most open and universal system for end user devices.

It is there in Android mobiles and tablets and I expect Apple devices too, though perhaps someone can clarify that, as well as all our computers.

I don’t include cloud silos because they are proprietary and as noted, deliberately constraining, so not open, and also not universal.

As for the rest, I trust that I’m not trying to force this but think it’s important to support what people (end users and Devs) understand and use while offering them alternatives.

So yes, what the Safe APIs support should enable innovative as well as legacy approaches, and over time improvements will happen because instead of entrenching what benefits a handful of owners and investors, Safe Network opens up possibilities without prejudice, and will create opportunities for more people to innovate, and new solutions to scale than we’ve seen before IMO.

Topic		Replies	Views
Syncer: a caching FUSE based filesystem in Rust Apps	26	4438	July 11, 2020
SAFE Rsync and Chunks etc Development	6	1355	February 11, 2018
Experimental SAFE App: SAFE Drive lib and Rust-fuse Development	7	1404	May 13, 2015
SAFE Network Dev Update - August 6, 2020 Updates	23	2578	August 23, 2020
Safe Network Dev Update - September 3, 2020 Updates	23	2864	September 30, 2020