Will running a Git repo for maintaining and updating documents be possible on Autonomi

People,

I want to set up a Git repo for short stories that may be written by more than one person so I need to keep all previous versions of a story as well as the current for general publication - so developing writers can see the development of the story from the beginning. Will Autonomi be able to handle that sort of facility?

Thanks!

Without knowing the nuts and bolts of the network like some people on here do I would say, yes you could have that system. And it sounds like a great idea.

Atlas Atlas - Things on Autonomi

and

Friends :IF: Friends - the messenger you'll never want to move away from

Have already shown that systems can be built with this kind of functionality. But they are custom applications rather than being based on something existing like Git. I don’t know if Git would be able to use Autonomi storage natively.

Does your idea have to be based on Git? Or are you open to building something custom with the exact features you need?

There is a competition - Impossible Futures Timeline & Network Updates - which has already had its first round but there will be more rounds later that is for the programmers of novel applications to win funding for their idea.

And Welcome!

1 Like

maybe worth noting

Keybase did create their own (open source) git helpers to utilize their encrypted storage for git repos; one could have a deeper look at how they did it in detail and could implement the same functionality for a autonomi backend :slight_smile:

so definitely possible and there even is something one could build upon and change for our needs (not 100% perfect because it’s written in go and there’s no go bindings yet; but good enough I guess since go just compiles to binary blobs as rust does)

1 Like

Create an Auronomi vault for a folder, store a git repisitory to that folder, sync before push and pull.

2 Likes

Keybase is an awesome messenger btw. Use it for like 6-7 years.

I would love if some day someone would commit to their code and add Autonomi support as it very well fits into their concept - they already have cloud storage and git repo as you have mentioned.

1 Like

Now that I think about it, there is an app on Autonomi that a developer here has produced which is closer to your idea than the projects I mentioned above. It’s blogging with assurance that previous posts won’t be deleted. The ones I mentioned before are more forum and messaging systems.

2 Likes

My understanding is that autonomi stores files that can’t be changed - so even minor changes to a large file would require a whole new file to be stored? This would be OK for novice writers to see how the changes happened in the development of a story - but it would involve clunky use of diff between versions. I thought using git would allow the storage of just the changes from one draft to the next and the actual git files that would saved would just record changes and be therefore be smaller - and also for the novice writer, the changes between drafts could more easily seen.

I would prefer to not have to use any custom coding . .

Not sure I understand that - what about files that change in the git repo? - isn’t autonomi not going to allow that?

Ah! - I will have a look at that - but there is still the problem of easy diffing between drafts . .

Thanks people!

1 Like

This depends on how the file is changed. If the change is towards the end then maybe a couple of new chunks and datamap will need to be uploaded. If at the front then prob all new chunks.

Also if writing the code from the start then maybe keep some files in scratchpads until committed. That way you reduce the number of new chunks

1 Like

This depends on how the file is changed. If the change is towards the end then maybe a couple of new chunks and datamap will need to be uploaded. If at the front then prob all new chunks.

Also if writing the code from the start then maybe keep some files in scratchpads until committed. That way you reduce the number of new chunks

Hmm . . not sure how doing those things would be compatible with use of un-tweaked git repo . . Also, when you are talking evolving, roughly 10k word short stories, changes could be anywhere and amount to a single character typo fix to change / deletion / addition of whole new paragraphs . .

1 Like

While git itself is immutable (you can recreate any file to any previous state), the way it works internally does not bode well for using immutable types in Autonomi off the shelf. The biggest problem is the way it handles pack files. Each time you do a push, git will repack all of the old data so that it is optimized for size on the disk. This is why git is so fast, especially over WAN, because you’re working with an image the is constantly being repackaged into the smallest possible form. So it isn’t a matter of store the .git directory on Autonomi and you’ll just get file additions, you get new binary blobs each push operation you perform onto the network. If we want to use git ‘as is’ that’s what you’d end up with.

The “right” way to do it would be to create a custom git remote helper for dealing with Autonomi. Then you could say something like git clone ant://<AUTONOMI_ADDRESS> for example. You aren’t constrained by how git wants to store or transports the data, you implement the send and receive pack file information within the helper and git works without caring how the data comes or goes. Then we can handle the pack files more gracefully on the network vs brute forcing an unoptimized structure. We’ll probably need to build this custom git remote helper anyway just to deal with the ant://prefix because, why not?

The helper itself isn’t a problem. It is well documented, we can make one to handle the Autonomi network protocols itself, its a matter of how do you want to build the low level data structure? There are lots of ways you could do it, each with their own trade offs.

That’s the work. I started to look into doing this myself a few months back and realized it was a bigger problem than I wanted to tackle by myself.

5 Likes

I’m not sure that is a massive deal in the grand scheme of things. Yes, it sounds a bit wasteful. Although there is nothing safer than having an actual copy of each changed file. My main point though is considering the massive quantity of image, video, scientific, who knows what else data which will be uploaded which has much larger file sizes I don’t see it as being a big concern. For example, I doubt an army of coders changing parts of large projects could consume storage as quickly as a couple of ambitious video bloggers (I think we’re calling them ‘Influencers’ now).

2 Likes

I guess this is related to @zettawatt answer, it should be possible to use Autonomi effectively for git. But my initial suggestion just to replicate git repo to Autonomi is not very efficient.

Chatgpt:

Git stores data in a way that’s part snapshot, part delta, depending on the context. Here’s the clear breakdown:


:white_check_mark: How Git stores data (the simple explanation)

1. Each commit stores a snapshot of your files

When you make a commit, Git does not store only the changes. It stores a complete snapshot of every file as it appears in your working directory at that moment.

BUT—Git is very efficient:

2. Git stores identical files only once

Git uses a content-addressable storage system (objects identified by SHA-1/SHA-256 hash).
If a file hasn’t changed between commits, Git does not store it again—it simply points to the previous identical object.

So:

  • If a file did not change:
    :backhand_index_pointing_right: No new copy stored.

  • If a file changed:
    :backhand_index_pointing_right: A new blob object is stored for that file only.


:white_check_mark: Objects involved

Git stores data as three types of objects:

  • blob = contents of a file

  • tree = folder structure

  • commit = points to a tree + metadata (author, message, parents)

A commit is a lightweight pointer to a tree of blobs.


:brain: Internally: snapshot storage; packfiles use deltas

When Git repacks repository data (e.g., during git gc):

  • It compresses objects into packfiles.

  • In packfiles, Git may store objects using delta compression (differences between objects) to save disk space.

So effectively:

:check_mark: On commit:

snapshot, storing full content for changed files only

:check_mark: On disk after garbage collection:

delta-compressed, storing only changes


:memo: Summary

Level What Git Stores
Commit level Full snapshot of project (reuses unchanged blobs)
Blob level Only changed files are added again
Packfile level Git may store diffs internally to save space

:rocket: Final Answer

Git stores changed files completely again as new blobs, but only for files that actually changed.
Unchanged files are not stored twice.
Later, Git compresses objects using deltas to save space, but this happens under the hood.


If you want, I can show you a visual diagram, or walk through an example using real Git commands.

Also, doesn’t Autonomi have automatic de-duplication on block level?

But this does not necessarily help much; block size is quite big (most text files are only 1 block?), and if something is inserted/deleted at the beginning, all the blocks will change

maybe it would help to always gc before autonomi sync? Or does it just mess everything so everything is sent again?