Safe filesystem API for a FUSE implementation in Rust

happybeing · February 22, 2024, 12:07pm

I thought this would work too because people said the history is available in the Register, as if whatever state you put it into can be recovered through some iterative procedure.

There are no examples of this however and the only reference to using the MerkleReg for history that I could find points to two methods that can be used to interrogate the tree. In a sense the tree holds the history, but I don’t see how that can be used to step through old states except for a trivial case of a single value (e.g. one file reference, not a directory of entries) without including nodes for each state you wish to mark for recovery. @Anselme says he agrees with this.

I see how that can be done, but it breaks down if you want to do this for subdirectories wrt to parent state and you need this for many reasons.

That could be done too but gets complex. (You’d have to update the register for every parent directory for any change to content of a child directory, which might be ok. I’m not sure.)

So you can come up with a scheme but it’s not trivial, and you can’t do hard links whatever you put in the Register. I’m trying to find a solution that fits the latter so we can support a very wide range of applications using a FUSE drive and/or a std::fs API.

If you disagree I suggest you work through some example cases (new file, new file, update file, concurrent update file, merge, new file, delete file, etc and the same for a subdirectory) showing the Register structure at each step, and then show how to get the history using the Register API. For example, how to get the set of directory entries at each step back through the history.

I believe that can be done provided you create a structure inside the register that records the state at each step, and update this for all parent directories above a subdirectory when the latter changes.

If you can show how to do this without such structure also being recorded in the Register it would be very helpful and I’ll owe you a pint.

That’s a lot of work but there’s a lot riding on what appears to be a mistaken belief that a Register can be stepped back through it’s history. At the very least we need an example of how to do that for a couple of non trivial cases. That’s what I wanted to demonstrate based on the chat example, in the new example I created, but couldn’t, so I changed it to show the structure instead of the history.

It looks like @bochaco is taking the approach you describe, so maybe he can demonstrate how to do this already? Or maybe like me he just thinks the Register can do this for us and hasn’t yet looked at how.

I keep posting that I’d like to be contradicted on this.

dirvine · February 22, 2024, 1:56pm

The API for this is missing, but it’s a critical part of the register for sure.

It need not get complex though. I am not explaining myself. So here goes. Where I write Dir I mean register, DirPointer points to a register and FileMeta is file data, (name, size, data_map, access time etc.)

So we have a root Dir That just point to a Chunk
That Chunk contents for example are
FileMeta
FileMeta
(DirPointer/Dir Name) // say Music as example
FileMeta
(DirPointer/Dir Name) //say docs as example
…

So our root dir points to a chunk where we know it’s a list and that list contains the above. IF the contents don’t change then the root dir does not change.

Now let’s change the contents of the Music dir.

It has the same structure on it’s chunk that the above has.
We point to it and we get the latest Entry which points to the Chunk that parses as above.

So if we add to the Chunk, then the entry needs to change as the Hash(chunk contents) == name == Entry name. The Music dir now has a new latest entry, but we did not change the root dir. We can though get history of the Music Dir and traverse that if we wish.

With the above we can link to different files in history and ensure they are always the same file and so on. So we can get links too.

If we Remove or Delete a files all that happens is we again

create a new Chunk
Add the new Register entry and that’s it.

All of the above is CRDT and able to be concurrent. Of course there can be forks, but the forks can be merged (need to confirm our API exposes that, but the register has that functionality) by the user.

I hope that helps.

happybeing · February 22, 2024, 2:40pm

Thanks David. This is different to storing directory entry pointers in the Register which is how I understood your earlier explanations and if you look, I think is what @bochaco has done so far to implement the FoldersAPI (from memory he’s using a register per directory and pushing the file entry pointers for that directory onto the register. Whereas you describe keeping all the entries of a directory in a chunk.)

I see how this design creates history per directory while using the register in a simple way (in effect, a history of a single ‘value’, the pointer to the chunk holding the entries of the directory). I see how field and merges work.

The earlier design I described used a single register per directory, but with the entries stored in the register, and a chunk per directory entry provides the same functionality as far as I can see. I have shelved that because it doesn’t support hard links.

If you don’t have a way to sync versions of parent with child you don’t have a true history though, so I think you need to add that. Otherwise how can you get the versions of a website across multiple versions? It’s no good just having separate history for each subdirectory, you need the correct state of all directories at a given point in their history.

That is soluble using the same approach I described for my now shelved design: you update the history of every parent when the state of a child changes.

I also don’t see how this supports hard links, or do you just mean symbolic links?

dirvine · February 22, 2024, 3:04pm

I think that’s already implicit in the above @happybeing. Any link from 1 file to another, even in another tree would still work. Any Dir that has a linked SubDir would still work as well.

Do you mean something different here

happybeing · February 22, 2024, 3:42pm

I mean that what you have are separate independent histories for each directory. I don’t mean that you will be left with directories that are not self consistent, which is what you may be thinking I meant?

So let’s say I have a website with a parent directory and one child directory, and each have a set of files.

To reflect the state of the website at a particular point, you need to know the corresponding two points in the history of both parent and child, and unless you have a mechanism to link the two points, you can’t get the correct state of the whole tree at a given point. You will have a set of points in one, and a set of points in the other, but you don’t have a way to keep them in step to see the history of the entire tree.

happybeing · February 22, 2024, 3:57pm

This is why I’ve converged on a much simpler approach - keeping the entire tree in a file and versioning that with a single register.

This enables the filesystem to support hard links and anything else, because it is a stand-alone thing, and versioning applies to the whole thing.

Branching and merging at the Register level is simple, but merging two different tree objects is more complex although already a known problem for which there may already be good strategies and tools, academic papers and so on. So I think it is workable.

Even if merging the file tree objects without using CRDTs is deemed too difficult, that feature could be dropped for the time being and instead we have a simpler but capable virtual drive with all the features any application might want (from git, to npm and anything else pushing the limits of POSIX).

If later a better TreeCRDT becomes available I think will be feasible to slot that in as a replacement for the tree-object without breaking anything using the mount or std::fs compatible APIs.

Similarly we could use this tree-object as step one and later slot in your or my multi-register designs, perhaps without hard links as an alternative. All without changing the main UI or any libraries that are in use by applications wanting to access the filesystem directly on Safe (rather than via a FUSE mount).

riddim · February 22, 2024, 3:57pm

At least I think this is what happybeing referred to:

With parent and child one could even argue the parent only needs to update a sub folder hash/revision number or something and it’s okay that just the subfolder stores it’s changes
… Since the parent wasn’t touched…it is valid until the next subfolder version that appears

I guess the problem gets very visible with 2 subfolders at the same level… How to know what state the music folder had at one certain state of the docs folder…

happybeing · February 22, 2024, 4:01pm

That’s the point, although it doesn’t matter whether they are at the same level or not. You need to know the state of the entire tree at a given point or you don’t have a history of the entire tree - you have multiple independent histories of each directory and that isn’t enough to version a website or a source code repository etc.

riddim · February 22, 2024, 4:07pm

True - but when having different levels you don’t necessarily need to propagate all levels up if the convention is that the parent folder is still valid until a new subfolder version appears there… (this ofc would need to be updated on any parent folder changes then… And you’re right… It then wouldn’t be clear if the last subfolder change came together with the parent folder change or a while before that… )

happybeing · February 22, 2024, 4:15pm

If you don’t sync from the change all the way up to the root you won’t have a complete history for the tree. You’ll have a history for the root that doesn’t contain the changes in the tree. It’s the same problem.

EDIT: maybe if a child records the parent’s state (point in parent history), in itself whenever the child state changes that would be enough.

riddim · February 22, 2024, 4:31pm

Okay…

/randomfile
/music/ (v0)
/docs/ (v0)

Now I add files to the music folder

my_rid_hymn.mp3 (making the folder to v1 - only stored within the music folder and not propagated up)
another_cool_sound.mp3 (now we’re at v2 - only stored within the music folder and not propagated up)

… If I then add another file to the parent folder I need to update the then valid version of the subfolder too to make sure history is traceable… So we’re at

/randomfile
/newfile
/music/ (v2)
/docs/ (v0)

… And all versions of the music folder not explicitly listed in the history of the parent are implicitly sub versions of the previous release of the parent folder…

So imho opinion there is a difference between folders on the same level and parent-child relations… But not worth fighting about since the same level problem persists

riddim · February 22, 2024, 4:33pm

I guess similar to what I was suggesting - but the parallel folder problem remains then

Edit: well… Unless you record the version of the other subfolders too and point to that record again… Buuuut then we’re basically back to complete info propagation and could just write the versions of all subfolders to the parent

happybeing · February 22, 2024, 4:44pm

None of these partial schemes work reliably because you somehow have to keep a record which synchronises multiple changes in different parts of the tree. Unless a maths guru can prove that a partial scheme is ok I’m going to stick with propagating to the top.

A secondary issue is that, assuming a partial scheme is ok, how will people want to use the history and how is that affected by that versus a top down history.

Anyway, not worth arguing over at this point. The first step will be something that is feasible to develop and which has known useful characteristics. That’s what I’m aiming for, and at a minimum I’m hoping for:

a perpetual versioned POSIX style virtual drive for Safe Network that leaves only encrypted cached data on the device, which can be wiped automatically at any point (e.g. when unmounted).
API including std::fs and low level FUSE capable Rust library for use by any application wanting to access a virtual drive on SN (directly)
FUSE mountable on Linux, MacOS and hopefully Windows (though perhaps with some limitations). I noticed that Rust has compatibility implementations for things like links on Windows so even that may be possible.

Toivo · February 22, 2024, 5:08pm

Wasn’t enough to me to just like your post above, so here’s some extra likes:

Really nice to see you taking on such a good task!

happybeing · February 22, 2024, 5:26pm

Thanks . I make no promises as to how far this will go but I always follow what interests me and learning about Registers and ways to use SN APIs is enjoyable, designing with them moreso, and building something useful is best of all.

vdash began as a way to learn Rust and tui while trying to make something useful so who knows, but it’s a lot of work.

dirvine · February 23, 2024, 6:10am

Sorry @happybeing @riddim I am still having some issues figuring out the backwards traversal here. I will detail my thoughts. First though this:

Registers are lists of pointers
data_maps are a collection of pointers to data
Chunks are pointed to by registers and contain data_map plus any metadata

So I have a root_dir and a bunch of pointers to actual data (data_maps) and pointers to the version of that data (Register). So everything points to data in a versioned way.

So if we have say a web site or some collection of things, like a blog post with pictures etc. Then they have their own root_dir and the same collection of versioned pointers to actual data.

I think the confusion of links and what not is a spurious one form hard disk days where we tried to have single copies of data linked, so save disk space. SAFE has that and it’s called deduplication. It’s more powerful than links which you can forget to do and copy the data again.

So SAFE is a deduplicated network. There won’t be more than one copy of a file.

So when we look at this from that angle, there is never a need to link a child back to a parent. We can always look from the parent to the child and that works fine and is versioned.

It’s all about how do we address the actual data!

We don’t address data in one store (web site/blog etc.) by traversing other folks link set (dir structure), we do that by creating our own link set (dir structure).

So for our web site, we may want the music file M from David’s Dir at version X in Directory Y. We get that chunk the entry points to (the chunk has metadata and data map). Then we want file N form Marks images dir at version XX in Directory YY. We get the same chunk and pointer (entry name).

Then we create our own root_dir for that web site/blog or whatever. We have the whole history that is independent of what Mark or David do with their directory structures.

I think it boils down to a different world with deduplicated data and working with just pointers. We can and should have a new root_dir for every collection of data we want. So every doc/site/blog and so on has their own Directory (register) that collects all the data for every version of it.

happybeing · February 23, 2024, 9:19am

Taking riddim’s two children side by side approach, each has is own independent history yes?

So if the root and those children are a website, you have all past files, correct, but you don’t know which version of files in one child correspond to the versions of the files in the other child.

Therefore, you can’t produce the history of the whole tree / website

To do that you need to record which versions were active in both children after each change and to be able to traverse that history sensibly. It is clean and simple to reflect that in each parent of the changed directory up to the root

My design has this built in using one register and one chunk per directory, and one chunk per file.

Your design doesn’t record the total state at any point, only the state of each directory independently.

That makes it very hard, impractical really, to step back change by change through the state history of the entire tree / website, which is essential to be able to do. You can do it for any directory, but not the entire tree because the directory histories are independent.

Another difference is that you will use more chunks because you use one for every change to a directory, whereas mine only changes the register (plus whatever is being pointed to - but both use the same approach to point to other directories and files so no difference in chunks there).

Toivo · February 23, 2024, 9:28am

You mean like if we have a website that presents two words side by side, like:

Hello World!

And those words are each in their own directory. Now if they are modified so that website says:

Goodbye Darling!

Then we have two histories: 1) Hello, Goodbye and 2) World, Darling.

But we don’t have a way to know which of the following the website has been:

Hello World!
Hello Darling!
Goodbye World!
Goodbye Darling!

But we would like to point out later, that it actually said this and not that.

dirvine · February 23, 2024, 9:42am

Why do that though? Why not have the website have it’s very own dir structure and not mess with another one.

dirvine · February 23, 2024, 9:44am

This is really only true if you change the file metadata which is the chunk, otherwise it’s the same chunk and deduplication would make it so. IF you change the metadata or file content, then you have made the change and you record it in your dir structure.

Topic		Replies	Views
Syncer: a caching FUSE based filesystem in Rust Apps	26	4446	July 11, 2020
SAFE Rsync and Chunks etc Development	6	1356	February 11, 2018
Experimental SAFE App: SAFE Drive lib and Rust-fuse Development	7	1404	May 13, 2015
SAFE Network Dev Update - August 6, 2020 Updates	23	2578	August 23, 2020
Safe Network Dev Update - September 3, 2020 Updates	23	2864	September 30, 2020

Safe filesystem API for a FUSE implementation in Rust

Related topics