New data types - summary and discussion

happybeing · December 19, 2024, 5:26pm

@dirvine, I’m glad to see progress on this. I’m not sure if my suggestion for changing the name of Transaction was considered (see “Transactions, the name” below)?

Also good to see that filesystem APIs are being considered/worked on.

I’ve been working on my own solutions for this (both library and app). It is important to get this and the other last minute APIs right, so I’m concerned this will - like registers - end up not being thought through in a rush to launch by end Jan. Perhaps this isn’t intended for that timescale, but regardless, perhaps it would help to do this work more publicly, and start engaging with the community and developers? That would give some time for third party developer input and testing, and could save us outsiders developing things unnecessarily.

Transactions - a chained type to replace CRDT Registers

Transactions, the name

Unrelated, I’d like to suggest a change in name for this type because Transactions are an application level concept. The name implies certain use cases that will get in the way of discovery and understanding for developers who can use this to record a sequential tree of anything.

I suggest a name that fits the underlying properties but isn’t already over loaded with meaning (or with misleading meaning anyway). Naming is hard!

At first I thought ‘Step’, rather than Link or ChainLink which are already used heavily but might still be worth considering. Maybe there’s a better term.

For my API I came up with History and would be content to let you adopt that for an API which creates a history for given data-types based on Transactions. The point is that you can have a history of anything so I chose Trove (collection of valuable stuff) for the underlying unit and so have:

Trove a trivial trait you implement on anything you wish to create and manage a history for.

TroveHistory<T> as a templated type so you can create and find entries using a templated API for your type.

FileTree, a ready to use struct for storing a tree of files or a website, for which TroveHistory<FileTree> provides the API for managing its history, or similarly for any developer defined type.

Details are in docs.rs as of yesterday (here). Not usable by third parties yet but working in awe. It uses Registers for but will be easy to refactor for Transactions when they appear, or if you wish to include a History style API directly in the autonomi crate and steal the terminology I’ll be fine with that. If so, I would though like you to keep the Trove trait as so that the TroveHistory<T> can specify the type using the first element in the chain.

dirvine · December 19, 2024, 6:01pm

I agree for sure here. That basics of a transaction was to move something from somewhere to somewhere else and even move the thing itself (so mutate it). They all go from Key to Key, so they do move, but what we move is limited to a pointer size (that can be a value <32Bytes as well mind you).

A nice thing we can do though is call them what we want. So say they are called ABC and we can import ABC as XYZ" or "use ABC and DEF etc. Which tends towards an almost unspecific name, but right now I am super keen on getting the type out and used to see what it does and can do.

Sad thing is registers were massively overthought and over-engineered and mathematically brilliant, but useless in our network. I fear it’s a thing we have faced over the years, but hopefully now we are all on the keep it as small and simple and decentralised as we can, boat. This was all on us, not on the community and relates to

And so far Transacitons were an internal RFC and I posted that here. Pointer is a trivial type and only a slack message with a huge conversation with me explaining it. Even that I would have loved to be more public and it will be for sure.

In fact here’s some snippets.

I am hopeful this helps us as a team . These points are not opinion or hearsay etc. They are very important.

Some issues

Our data types are spread about right now. We have transactions, chunks and Scratchpad in the ant-protocol dir (very much lacking in tests or documentation. Now this location is maybe OK as this dir is used by clients and nodes. So kinda maybe makes sense.

Registers are separate and in their own dir, but they are deprecated anyway as the network cannot handle them and they are fundamentally broken in our network as they cannot scale. So they must be deprecated. That was a close call having them in place.

We have other types in client that build on these for filesystem support. So that support is good, but again difficult to find and again tests and documentation is an issue.

However our filesystem stuff depends on registers and we cannot use this. So that is a huge red flag we must be aware of. This needs addressed with extreme urgency (read on).

So we sorta have all the bits but not much of the how to use this and the API we have is massive and we need to get it clean and clear. (docs/examples/tests etc. I note the examples in client are the Python examples, but we can add load more in Rust and Wasm and hopefully also in nodejs if we want JavaScript folks on board with filesystem/network capabilities etc.).

The Python client publishing has now been broken but can be easily fixed.
This is where we are!! It’s all incredibly fixable, but it is really focussed right now on so much intimate knowledge of our codebase to find anything.

I was in process of collecting together data types into it’s own module, but ran access many issues with just using our codebase that instead I feel we need a team here working on this and with focus on the eyes of partners or 3rd party devs trying to use the system. We need to have a different mindset than we are used to in these last few years. It’s very very important we switch to usability now. We have done wonders and now is the time to release a working system that folk can build on or partner with us on.

My main issues I know we can fix!!

The data types are hidden away in the protocol lib without a README or tests (I cannot understand this as our AI will do this task in seconds with a simple prompt, that can be improved no end with a PRD, specification and detailed design, that again the AI can create alongside us.).

If we want to release a lib for node we need to separate out the antnode binary crate as that’s how rust works, you cannot have a static binary and a lib in the same crate. (we want to do this to allow bindings of any type as they need access to a lib and usually a ‘c’ lib not a ‘rustlib’.

The client API depending on registers is a serious issue we need to solve, it’s a recent change so understandable, but that does not change the urgency on resolving this one.

The README of our client needs a drastic overhaul and same with the sub dirs of it. We have to imagine we are partners or devs trying to build on it and yes we have rust API docs and that should be something folk use, so we need to also provide links to that and make it clear in README.

The Client API should be data types and manipulating those along with wallet management and using that (no need for evm or wallet for browsing, so we much make that simple).

I REPEAT THE CLIENT API IS DATA TYPES AND WALLETS, all networking can be hidden (we can still pass peers in emergency to the client.init or similar, like we can with nodes, but default is invisible networking). So having the data types and their capabilities clearly documented seems essential. That can be a super simple API that we can deliver with ease.

This last point is really a big issue we must address. Right now we need to get our heads out of expert rust coders with intimate knowledge of our codebase and get to a place where we imagine we are a dev opening up GitHub and looking at our code. We have to have docs/READMEs and plenty of testing in place here.

Lastly, we likely need another simple data type to allow in place updates of a pointer. This would be a very simple thing that sits between a scratchpad and a transaction. Its purpose is to show a current bit of data (i.e. a folder address in a filesystem or what folk thought registers could do, but cannot). We would need this for network based filesystems etc. It also serves as a DNS type of thing where web sites can be hosted and updated etc. (however not human readable names, these are 32 byte addresses). This type would be a pointer and be simply the following:

   struct Pointer {
        owner : PubKey, // this is the address of this data type
        pointer: XorName,
        counter: U32,
        history: Transaction,
        signature : Sig  // of counter and pointer (and Transaction)
}

This will allay fears of people who imagine they must traverse a whole transaction chain to find the current value (whatever is pointed to). It’s a simple Counter CRDT type. Nodes will, like scratchpad always store the latest version and if they replicate then nodes will always keep only the last version and update any node who gives them a lower version with the latest (same as we should do with scratchpads).

This is seriously important we address this with urgency and it’s hugely within our capability as a team to crush these points very quickly.

Then a long conversation, where I showed the Pointer needs a Transaction field and Transaction need the child to be optional. There are several reasons for that, but we can enforce history (which is a bug bear of mine) and this is a constant thing in my mind. Anyway we go on.

What we really need to do is map out types and specific use of them and find where we lack. So right now we have:

Chunks - CDN network type, our store of bytes [immutable and permanent]
Transactions - Allow pointers (to chunks) to be passed from owner to owner(s) [preserves history]
ScratchPad - Small unstructured elements of data [mutable and no history, but constant address]
Pointer - Pointer only, does not transfer and has constant address, ho history

So we have 2 types that preserve history (permanent data) and 2 that throw history away, but are limited to holding only pointers or small unstructured data.

This is the kind of thing we must map out in great detail to figure out, why lose history, is that OK? does it break the promise of perpetual data and so on. i.e. the pointer will be used for websites, but the history of a website is lost, but the data of the old site is still there in chunks.

These are the kinds of discussions I am keen we get to as we get the Client API in place where we can build with all sorts of languages (compiled, interpreted, typed, dynamic and more). This will let us see what’s missing and what we can improve upon as this is only what we are delivering really, no more and no less

Then some detail of the operation

It’s this really. HEAD is the Pointer. So we have 2 Pointers here A and B. B will replace A.

So we look at the value of each of these and then look at Transaction B, this is a Transaction who has a value of B in the transaction. It will have a parent Transaction and that parent will have the value in it for Pointer A (because we know Pointer A we can pre fetch this).

So Transactions are referenced by Owner, i.e. a PubKey but have a value. The value is what interests us. So we want a transaction chains that the head of it holds the value B and the parent of it holds value A. So we check the Transaction pointed to by our Pointer A and also our Pointer B. The parent of the Transaction B should point back to the Transaction with value A

Then a bit more of me bleating on

I am still on data types. So we have our 4 fundamental types and as we have found out ( Chunk ScratchPad Pointer and Transaction and the latter 2 are connected in what they do. As we have seen Pointer types lose history, but Transactions keep history. This is actually phenomenal for us. The promise of Registers was unlimited growing CRDT data types, but they were fundamentally broken. They also suffered from read the whole history to get he latest version (very unergonomic) . Pointer has a single version, so it’s always the latest. That is great, but it loses all history.

So can do a really really clever trick here and achieve our goals of infinitely (feasibly infinite) growable mutating data It’s very very simple and here it is. (this is a very big deal folks, please lean in)

So we have Transactions. They transfer a value from owner to owner, the value can change according to how the app allows it. So this is critical.

Then we have Pointer and that points to the latest value, but loses history!! (that is very very bad).

To fix this we do this little trick.

Add a Transaction field to the Pointer type. THAT’S IT, we get fully growable, history preserving mutable data type. (note every published thing that changes will have history, that just first principles)

How it works is this:

Node gets a Pointer update command (so will change the Pointer and update the version number)
Node checks the Transaction included in the pointer
Node checks the Transaction exists and its parent is the Pointer value we are about to over write. To be clear, the new Pointer value includes a link to a transaction who is the child of the current Pointer value.
If above is true the Pointer can be updated

This means we not only replace registers, but we give them the functionality people thought they were getting PLUS a mechanism to easily and quickly get the latest value (they just read the Pointer). Then they can get full history but can also go back through the Transaction history (which is the history of the published artefact).

In this way we get websites/documents/fileystems all simple to publish, but also automatic history of each “thing” anyone publishes on our network.

Now we have a fully mutable data network where folk can publish, not only websites, docs or filesystems and more (like AI models) but also with fully automatic history that cannot be erased. So a true perpetual data network.

We still need a mechanism of human readable pointer types, like a DNS etc. but that is another topic. The above gives us our basic perpetual data network though.

Hope all that helps to let you see the thinking, none of it really new, but almost all lost in years of work on the core network and as I said the guys seem to have handled that part very well now and I am pushing for the data and API to get a significant focus (so are the team but they are swamped), keep it simple, get it out and get it used.

I am super keen on simplicity here and if we do need to change there should be no major issues we have with a complex type in place.

Added to the above the guys have done a load on filesystem and archive types, I have yet to get to those, but they are almost secondary and build on the fundamental types. I see a lot of that happening in the community, where it should be.

happybeing · December 19, 2024, 6:10pm

If anyone has time to pull that into a readable post on a new topic that would be great. As it is I fear it isn’t going to help.

David - thanks, much better
@moderators - please can you make David’s post above into a new topic under APIs called “New data types - summary and discussion”

dirvine · December 19, 2024, 6:12pm

Yes, it’s not easy to paste stuff in here for some reason. I tried a couple of things, but seems line breaks are an issue. Sorry about that.

[EDIT I think I may have made it a bit easier by blocking the text twice??]

Southside · December 19, 2024, 6:47pm

Well thats solved the problem of what to do with myself until Stages…

Seriously, David, theres a LOT in that post and Im going to work through it slowly.

But thanks

I probably won’t need the snippets till after stages so its no big deal you missed them out - was it the

that screwed the formatting up?

neo · December 19, 2024, 9:25pm

Did you try using the code quote
```
text blah blah
```

EG

text blah blah

dirvine · December 19, 2024, 9:30pm

Yes, that is what I eventually did but had to nest them for some reason. A single quoted block did not word wrap until I quoted it again. Weird

Lisa_Brown · December 19, 2024, 9:46pm

Meh, meaningless! Your main post just blew my mind!

riddim · December 19, 2024, 11:21pm

Somehow I thought wasm died with v0 of dave

Nigel · December 20, 2024, 4:33am

@dirvine so great to see you back where you shine! Loving the simplicity.

What a banger last update for the year.

riddim · December 20, 2024, 7:55am

Wait - this means that once I’ve started to move down a branch in my tx tree there’s no way back - right?

So once created I cannot arbitrarily move my pointer (bound to a tx chain) but it only works via merge/rebase

No show stopper but surely a limitation that might be seen as a feature or anti feature I guess

riddim · December 20, 2024, 8:01am

Ha! And that’s an easy one now

Just hashing the name as seed for a private key which then is the pointer to the website.
bound to it is an arbitrary tx.

Everyone will know the private key to the pointer but cannot change it because they cannot create the valid child TX

When I sell a public name I just version the pointer and link the buyers child TX.

dirvine · December 20, 2024, 9:06am

The pointer is a counter based CRDT type. So you can Replace it as often as you wish.

Still a place for wasm in our libs, just not network or disk based calls, but for js guys etc. a wasm lib is just like a js lib really, so we can still hide a ton of rust code in there for ease of use and speed. So think js lib when I mention wasm as that’s pretty much all it is “compiled js”

Traktion · December 20, 2024, 9:42am

You could argue that is a design feature.

A regular linked list / tree data structure with a similar head pointer feature would normally be encapsulated. What is ‘head’ shouldn’t really be changed outside of that data structure.

Ofc, we should be able to maintain independent pointers to any location in the linked list / tree. As long as we can have arbitrary pointers defined, I’d say we are all good.

This also means we could pin specific versions of a website / app / etc, by referencing a pointer, which points to a specific node in the linked list / tree.

Fwiw, I’d like to reiterate the point about using standard data structure naming within the network. Folks are familiar with them and will immediately understand how to use them. If we unnecessarily give them special names or esoteric behaviours on the network, it will lead to avoidable confusion. In short, we should call a spade a spade, where possible.

dirvine · December 20, 2024, 10:06am

This is also fine, but each would have a different name, but that would be required anyway, so all good there.

I agree for sure. Here though in our network we need to consider decent terms, but also on our perpetual data issues, so any “new” item is not an overwrite or replace, except pointers and scratchpads. A Pointer is like what we have here as it’s replaceable with a new name (like a normal pointer), The Transaction is like a chain or linked list and perhaps LinkedList is a good name there actually. That makes sense!

What do you think @happybeing with LinkedList and Pointer as types?

loziniak · December 20, 2024, 4:18pm

Hi! It’s great, that we have this new topic, but still ~~some~~ of the questions in the last one remain unanswered.

And there are new ones, like:

How will node differentiate between native TX and a non-native one? Since they differ only by genesis TX, do I understand well, that node will have to trace back to some earlier transaction, that it knows is (not) a descendant of Native Token genesis TX? And if node is new it will have to trace all the way back to Genesis? What mechanism is planned here?

As long as it’s not more than 4,294,967,295 times

dirvine · December 20, 2024, 4:35pm

None of this is about native spends. At that time there are a plethora of choices, deciding right now means stop thinking about other stuff and IMO it’s the wrong time for that.

Quite Although wrap around is also possible

loziniak · December 20, 2024, 5:18pm

I’m not asking for a definitive answer, since Native Token is not in the plans for today, but I’d more expect what do you think could be possible, because it could inspire some usage ideas for new datatypes, like creating own currencies.

Traktion · December 20, 2024, 5:43pm

Perhaps there are use cases where tracing to the root is necessary? Financial transactions would likely require that rigor.

For website versioning, obviously such rigor isnt necessary.

Maybe there are ways to optimise scans to root, but at least each chain is only for the one token, rather than all (like with a traditional blockchain).

loziniak · December 20, 2024, 6:27pm

Sure, but question is – you’re given a transaction, and how do you know which chain it belongs to? There has to be some clever solution to this…

Topic		Replies	Views
Update 19th December, 2024 Updates	39	1303	January 5, 2025
Transactions - a chained type to replace CRDT Registers Development api , registers , transactions	150	995	December 9, 2024
Update August 5th, 2021 Updates	317	8880	August 30, 2021
Update 30 September, 2021 Updates	47	5119	December 7, 2021
Update 14 December, 2023 Updates	70	3194	March 4, 2024