Transactions - a chained type to replace CRDT Registers

dirvine · November 28, 2024, 12:58pm

Just a wee note @happybeing but good to get your feedback. I have an internal discussion now on replacing registers and preparing for generic transactions. All logic will be client side for these, so hopefully max flexibility and better ease of use (no more understanding crdt trees and DAGs etc). I should say though app devs should still use one time use keys here to make this work better for them. As we use BLS there you can derive any key from a root key, so that becomes simple but efficient.

Here’s a preview (it’s my usual very short to the point stuff here).

Summary

This proposal introduces a simplified, flexible transaction structure for the Safe Network, replacing the existing register-based CRDT system. The new system removes constraints on data growth, reduces complexity in the codebase, and shifts transaction logic to the client side, thereby lightening the load on network nodes.

Motivation

The current Safe Network architecture employs CRDT trees for registers to manage mutable data. However, this approach presents limitations:

Registers have a growth cap, leading to scalability issues.
Transactions are tied to specific data types with hardcoded node rules, increasing complexity.
Network nodes bear the computational burden of managing transaction logic.

By introducing a generic Transaction type, we aim to:

Simplify node implementation.
Eliminate the growth constraints of registers.
Enable more diverse use cases through flexible client-side transaction validation.
Reduce the network load by decentralising transaction logic to clients.

Detailed Design

Proposed Structure

struct Transaction { 
    owner: PubKey,
    parent: Vec<PubKey>
    content: [u8; 32], // 32-byte array
    outputs: Vec<(PubKey, content)>, // allows currencies or fork type data behaviour (if necessary) Will usually be a single entry
    signature: Sig // signs the above 4 fields with the owners key
}

Key Features

Client-Side Logic:

- Validation and rules are implemented exclusively on the client side.
- Nodes only handle basic validation and storage.

Generic Use Cases:

- Currency: Transaction chains can track ownership and validate output balances.
- Filesystem Directories: Chains resolve forks, with clients choosing the branch to extend.
- Metadata Linking: The content field can point to network-stored chunks, enabling rich metadata references.

Examples

Currency

Client Check:
Ensure parent transaction is unique and matches the child as an output.
Verify outputs of the parent are non-negative.
Trace transaction lineage back to a genesis block or sufficient depth.

Filesystem Directory

Client Check:
Identify forks as nodes with multiple children.
Follow both branches and merge or resolve conflicts client-side.
Use content to point to metadata for directories.

Advantages

Simplicity:

The transaction type is universal and straightforward.
Removes the need for complex register-specific logic.

Scalability:

No inherent size limitations in transaction chains.

Flexibility:

Decentralised logic allows applications to define custom rules and behaviours.
Supports diverse use cases beyond the limitations of current registers.

Performance:

Offloads computational overhead from nodes to clients.
Reduces network traffic and processing.

Drawbacks

Increased responsibility on clients to implement robust validation logic.
Potential for misuse or inefficiency in client-side implementations if not well-designed. This is true for any client app though

Alternatives Considered

Enhancing Current Registers:

Addressing growth limitations within the existing CRDT structure.
This approach would still retain much of the complexity in node rules.

Hybrid Models:

Combining CRDTs and transactions.
Adds complexity without eliminating existing limitations.

riddim · November 28, 2024, 1:32pm

wow - this is quite a significant change … but might take a lot of work off the network and to the client side

the fun aspect would be … if this works reliably for data … there is no need to distrust it for coin TXs … and with creating and using this structure we’d validate the Token system xD …

if I understand correctly the public key of the TX would be the network address

so because the retrieved TX would be unique it would require the use of 1-time keys - right?

aaaand it would require to know the child already when creating the TX (and if the private key to the child is lost the chain ends at this point forever xD …)

the good thing about this approach is its flexibility … just today I was thinking about how to utilize the network as a time series database … with registers it’s somewhat cumbersome … with TXs I would “create a new time store” with:

basetime snapshots
higher time intervals that get snapshotted (for performance increase)
starting TX
first aggregated TX (for each upper level timeslot)

and every time step would contain a pointer to a dump of a collection of timepoints + the pointer to the following timestep … like that I could write 1min / 5 min intervals as batch and e.g. hourly data to improve performance … most certainly I would prefer to write the timestep data into a single chunk and not self-encrypted split into many pieces because it will be rather small (compared to 4MB)

…a somewhat nice way to store and retrieve time series data without hosting a database myself … (but hard to not see the chains it creates throughout the network … hard to claim we’re chainless with this chain of TXs anyway xD I guess it’s not everything bad about chains! )

happybeing · November 28, 2024, 1:46pm

@dirvine please can you include something of the API that’s being provided either in your post above or via a link to relevant Rust docs? It should help us understand this from a developer perspective.

I’m just about to publish an API lib of my own which gives versioning functionality for any data type (structs, collections etc) based on Registers I was so pleased with it as it’s my first use of Traits and a templated type for the History.

dirvine · November 28, 2024, 1:50pm

Yes that’s right. You can have forks in the same location but it owed not make sense to use the same key. We would provide a next method so it creates a new key for you. So you alway supply your master key pair and add more items at any time. Those will spread across the network.

I feel this is the key, we don’t force rules but let clients come up with any scheme they wish.

Yes, that’s true, but it decentralises the chain storage. So unlike a blockchain or whatever, here you have millions and millions of chains of pointers or values. So you should be able to create almost any app/currency we want and test them out

dirvine · November 28, 2024, 1:52pm

It’s not in code yet, but should be very quick there. It’s basically the type as shows with a few helper methods for adding to the chain. Forks or no forks can all be enforced client side.

Traktion · November 28, 2024, 1:54pm

Will this new feature still allow public as well as private transaction chains?

I’m assuming that knowing where the head is of a busy public chain could be challenging, if forks are to be avoided, i.e. knowing which transaction to append to.

Tbh, transactions sound like how I’ve been trying (and having issues with) registers. The limitless chains size also removes a big limitation there.

happybeing · November 28, 2024, 1:57pm

Yes, my first thoughts are we’re losing a lot of capability and replacing one set of client-side complexity with something at least as complex for less usefulness. I get that something had to be done to handle the size issue, but there are other ways to do that.

Big win seems to be simplifying nodes, but I’m not sure it reduces network traffic - we can’t really know that kind of impact until we see what apps have to do to achieve similar functionality.

dirvine · November 28, 2024, 2:08pm

Yes, just by encrypting the link you can be private. Same with output child (only holders of decryption capability knows actually where it’s pointing).

Yes, like registers, however I am not sure many folk wills start at the start, if that makes sense?

i.e. you post some data in a web site, link to another source of data (a transaction type) and you always want to make sure that’s always the same data. However you can also see it’s updated, if you wish. Then more sites do more links later on and the same things happens, but they may choose to share a later version.

So I am not sure how many use cases will want the whole chain, but it’s like a dot matrix printer really (in how fast this stuff can be). (i.e. if we Get a large file we are reading thousands of chunks and decrypting etc. so a chain of thousands would not be an issue IMO). Getting thousands of these per second should be possible with the network operating properly. Well clients should be able to send/receive thousands of messages per seconds and say, 30-40 messages per data time, I should be fairly fast to go down the chain.

dirvine · November 28, 2024, 2:12pm

It won’t reduce traffic to read the whole chain for sure. But for individual elements it’s likely faster, but I don’t think that part is an issue to worry about as much. It’s more reduce complexity and increase flexibility.

happybeing · November 28, 2024, 3:03pm

This will be fine for many applications, but something that is frequently updated will die pretty quickly if say, a social media app has to read the entire chain every time it is installed on a new device.

A way around that is to have centralised points to pick up later points in the chain (similar to peer discovery) and that seems like a really poor consequence.

I’m also concerned about the point @Traktion raises about discovering the ‘correct’ (concensus) head of a chain with branches and don’t understand your response to that. CRDTs were a brilliant solution to that. I guess clients could still use them in some way, though not sure how, but it would be a shame for this capability not to be a feature of Autonomi’s API.

Traktion · November 28, 2024, 3:07pm

Oh, interesting!

So, you could see a chain of transaction, but not necessarily who each owner was.

Presumably, in token context, multiple outputs could allow the ‘splitting’ of a token amount, with some going back to the original owner (or their new pseudonym address) too?

Sounds a bit like a blockchain in that regard. However, there would be lots of independent chains, for each genesis token. Pretty cool.

Presumably, ‘wallets’ to track which transactions belong to you would save raking through transactions from genesis too?

Traktion · November 28, 2024, 3:16pm

Yes, understood. Many use cases would be more concerned with the recent branches. File systems, forum posts, anything where trees are useful.

However, I’m wondering about how to ensure there is only one child, to form a public list, e.g. sequential versions of something, that anyone can update. These cases I don’t want a tree and branches would likely be pruned/ignored (by the client app).

If I query the head (i.e. latest tx), append a tx to it, but someone else does the same thing at the same time, presumably it is up to the client app to resolve this? I’d like to understand the mechanics of public list appends, really.

riddim · November 28, 2024, 3:45pm

maybe that’s a use case for

just another entry same location pointing to the head? xD … could result in an ugly mess at one point … (and since chunks are stored close to their address that particular point of the network would get somewhat clogged - right …? possible local flooding attack?)

Traktion · November 28, 2024, 5:00pm

Knowing the latest value of a list is really useful. Having to navigate the whole chain will quickly get boring/slow.

Maybe registers were trying to do too many things. Having a complimentary type that is a register in the traditional sense could be handy - a single/current value, with no history. Used like a variable.

It could be used as a ‘pointer’ to a transaction on the transaction list/tree.

Could be a common/shared version or private ones. Common/shared could be abused, but a rescan would overcome.

Edit: btw, I used HEAD in the git context, rather than linked list parlance. To be clear, getting the TAIL or last/most recent node is my concern.

neo · November 28, 2024, 11:29pm

Sounds like a plan here. No part is the best part someone is known to have said. Removing the work from the nodes is so much more Ant behaviour.

You mention “chains”. I am assuming this is just following the parent pointers to traverse the chain in reverse.

While it adds data to each transaction, could there be a use of having the “Origin” transaction so that an APP can see the origin without having to load down the network retrieving each and every transaction in the chain to get to the origin transaction?

dirvine · November 28, 2024, 11:52pm

Yes Rob, folk can use a chain of transactions. Myself I am looking at ways to share parts of neural networks So no chain there, just a transaction.

With registers this would be the same, so a register might hold a few hundred entries, but then it needs to split. With a CRDT tree you cannot split it so it grows forever, that would not work for us as it could grow more than 32 Gb of whatever nodes should hold.

It’s app defined, but just like a CRDT DAG, so you can have forks (more than 1 entry exists for a key) and you follow the fork to get the longest one or whatever your app decides is the branch to follow, if that makes sense.

Yes, you can do that for sure.

There is a lot of ways to do that, but an approach I like is randomised parent checking. So for a currency you have a rule, any duplicate transaction exists and that one is dead (a no fork rule). Then instead of going all way back to genesis you can go back to known points, or use a complexity algorithm to go back a certain amount of parents depending on the complexity of each transaction (i.e. many inputs is complex, a single input is simple).

Yes for sure there would. For currencies that would be a component and I am sure devs could have Audit nodes that just continually traverse back to genesis in that way. A bit like our DAG node concept.

TylerAbeoJordan · November 29, 2024, 1:01am

This sounds like a great advance for nodes and so the network as a whole.

Is this going to be a pre-launch or post-launch change? And if pre-launch, will it set us back time-wise any or can it be whipped up fairly quickly and tested?

Cheers and nice out of the box thinking @dirvine !

dirvine · November 29, 2024, 8:33am

It should be very quick to get in place and the client API updated (as it has to anyway). I imagine we mark registers as deprecated and force use of something like this instead.

happybeing · November 29, 2024, 3:41pm

David, I probably won’t dive deeper at this point so just a couple of comments.

This isn’t really the answer to my point - you’re saying in effect that Registers also turned out not to be scalable for this kind of use and that Transactions doesn’t solve that issue for many anticipated use cases. Social media being a big one, but anything that can grow over time that is around for a long time or has many ‘authors’ is going to be difficult or perhaps unachievable.

I realise this is in some ways much better than Registers, having found them problematic for months myself, but it appears to leave a key use case for them unsupported. Decorum for example. Anything where history is important over long periods risks becoming impractical to maintain in a history. So again we can end up with either centralised solutions or a decentralised web that forgets data by fragmenting in some way (even if the data itself remains stored on nodes, indefinitely). That’s my concern anyway.

Transactions, the name

Unrelated, I’d like to suggest a change in name for this type because Transactions are an application level concept. The name implies certain use cases that will get in the way of discovery and understanding for developers who can use this to record a sequential tree of anything.

I suggest a name that fits the underlying properties but isn’t already over loaded with meaning (or with misleading meaning anyway). Naming is hard!

At first I thought ‘Step’, rather than Link or ChainLink which are already used heavily but might still be worth considering. Maybe there’s a better term.

For my API I came up with History and would be content to let you adopt that for an API which creates a history for given data-types based on Transactions. The point is that you can have a history of anything so I chose Trove (collection of valuable stuff) for the underlying unit and so have:

Trove a trivial trait you implement on anything you wish to create and manage a history for.
TroveHistory<T> as a templated type so you can create and find entries using a templated API for your type.
FileTree, a ready to use struct for storing a tree of files or a website, for which TroveHistory<FileTree> provides the API for managing its history, or similarly for any developer defined type.

Details are in docs.rs as of yesterday (here). Not usable by third parties yet but working in awe. It uses Registers for but will be easy to refactor for Transactions when they appear, or if you wish to include a History style API directly in the autonomi crate and steal the terminology I’ll be fine with that. If so, I would though like you to keep the Trove trait as so that the TroveHistory<T> can specify the type using the first element in the chain.

Traktion · November 29, 2024, 4:18pm

Thinking about the long list, finding most recent tx, issue… perhaps it comes down to two things:

App owner or moderator periodically updates the app config to point set the best root/tip tx.
App user periodically does the same, either locally or to autonomi.

1 would be ideal for new app users who don’t want to traverse from root to tip. It requires an element of trust, but app owner/moderator is there for that.

2 would be ideal for the untrusting new user or frequent user who can set their own root/tip anyway.

As both of these would mean updating another tx list, it would need to be done less frequently to be worthwhile.

To optimise, nesting tx lists may be needed. This could allow an app root list to be most infrequently updated (as it needs to persist for a long time/indefinitely). Child tx lists could be used to prune old tx lists when they get too long.

Moreover, nested lists are essentially branches… so maybe instead of multiple lists, the above can be simplified with multiple branch depths.

It would take a bit of figuring out how to be optimal, but at least history is always retained. An alternative single value register/variable to point to root/tip would lose that.

Topic		Replies	Views
New data types - summary and discussion Features	25	336	December 21, 2024
How transactions (could) take place using Decorum Apps	39	4276	April 5, 2017
Safe Network Dev Update - August 27, 2020 Updates	84	3754	September 4, 2020
Pre-RFC: Linkable Data Structure Development	25	2265	January 6, 2016
SAFE Network Dev Update - June 25, 2020 Updates	92	4068	July 3, 2020