Safe API - Registers

happybeing · February 7, 2024, 8:28pm

I’m trying to understand the capabilities of Registers and don’t understand what they can do or how to use them. I’ve looked at (and modified) the example, looked at the code and docs.rs but am making very slow progress.

It would help a lot to have an overview of the capabilities because they seem very different to what we had previously (effectively a growing versioned array of values).

I’m not clear if a Register is useful for holding more than a single thing, which can have multiple branches which can be merged, or if it can be used like a collection of things (e.g. Vector, Map or Array). I see that each Entry is a 1024 byte vector, which can be used to hold various kinds of thing (such as a string as in the example, or a struct using serialisation, or a pointer such as an xor address etc.) but I cannot see how I would manage a collection of values in a single Register.

Would they be in a single register or would I use a collection of registers for example? If in a single register, how would I access the different things (cf. the different elements in a Vector).

The ClientRegister API is very limited and doesn’t show this, and the Register API doesn’t make this apparent (to me!)

dirvine · February 7, 2024, 11:33pm

Currently, I am hoping we limit Entries to 32Byte arrays. These can be links to any chunk/date on the network. So the register becomes the only mutable component and holds no data, just immutable pointers.

As it is it is a DAG type device that allows branching and merging of the Entry.

happybeing · February 8, 2024, 9:32am

I need help understanding the capability (see Q’s) and, if it can act as a store of multiple ‘things’ what capabilities the API provides and how to access them.

Entries I see, but so far they seem to be a sequence of values, not a versioned set of key/values as in the past.

The point of them seems only to be to handle a single thing (not a set of things), which can be updated concurrently and merged to convert a set of alternative values (for the thing singular) into a single consensus value.

I don’t see, for example how more than one directory entry can be held in a Register.

Aside: assuming it can hold entries for multiple things, like a filesystem directory, I’m not sure I’d go with such a small entry size as 32bytes. There’s a lot you can do with a 1024byte entry, and avoid having indirection. Limiting to 32 bytes means all values will be held in immutable files if they need more than 32bytes. Lots of considerations there I think. But first, I need to understand the capabilities so I can’t debate entry size.

dirvine · February 8, 2024, 9:48am

If you look at Entries as keys, keys that point to values. The values exist as chunks on the network, so every value is indeed an immutable piece of data, it lasts forever. Does that help?

So the register is a list (that can branch) of signed entries. These point to chunks. Those chunks can be parsed by any client to any format of data they wish.

neo · February 8, 2024, 9:57am

What happens if the user account is a register and when a user logs in on an infected computer that overwrites the account register?

Are those pointers (keys) lost? Or will it be like an append data type and only have keys added?

dirvine · February 8, 2024, 10:06am

The users account can be anything really. You are correct it is overwritable and as such could be prone to that kind of issue.

A smarter way could be

create a register
entries point to chunks (encrypted)
those chunks hold your user account

That way if there is corruption you can roll back to a previous one?

neo · February 8, 2024, 10:09am

Yes I’d hope that account values would be able to be rolled back. Otherwise once malware people become aware of Safe I’d guarantee that we will hear stories of people’s accounts being corrupted or even held to ransom.

But to lose the pointers to those chunks would mean you lose your account

dirvine · February 8, 2024, 10:13am

The pointers are from a replicated register though. So that is data loss which we must avoid for any data.

neo · February 8, 2024, 10:17am

What do you mean by this?

the 5 copies on the network
or some other replicate

If just the 5 copies then any modification to the register done at the client level would affect all those replicated copies.

But if something else then I have no clue what you talk of

dirvine · February 8, 2024, 10:20am

This is what I mean. These are append only, you cannot overwrite the previous contents of them.

Yes this is correct and expected behaviour.

So if the latest entry was corrupt, say, then clients roll back to a previous one.

neo · February 8, 2024, 10:22am

Ah this is what i wanted to hear. Some of the talk sounded a bit like it was completely rewritable. But hearing it is append behaviour then there isn’t a lasting problem then

happybeing · February 8, 2024, 11:23am

I already understand this. My difficulty is:

the capabilities this delivers. You are saying, I think, that this allows storage of multiple variables each with it’s own history of earlier values.
how to use the APIs to handle the above case. I can see roughly how to store a value and get the hash (key) and think I can see how concurrent values are held, accessed and merged. I’m less clear how to visit the different versions but accept that’s possible given the structure. I don’t see how to do this for multiple, independent variables in the same register.

The API no doubt provides the capability but I don’t understand how to achieve it. I didn’t see any tests for this use case nor is it explained in the docs.

Is there a model I could learn about that explains this use case - some things I could search for that will map onto the Safe API to help me understand how to use it? Or any internal docs? I can’t be the only one who needs documentation to get going.

I’ve looked at BTreeSet, MerkleReg etc but they don’t shed much light. Another avenue will be to set up a test bed to try API calls and print out the resulting structures but that will take a while.

I have more questions but some documentation could answer those and save a lot of time, even if it isn’t specifically for the Safe API, so long as it’s using similar structures and a similar enough API.

BTW, I’m not the only one easily confused by this. One of the tests uses a variable called parents for an API parameter called children!

dirvine · February 8, 2024, 11:27am

Yes the docs for this one are badly needing updated. Also the 32byte limit etc. cc @joshuef

happybeing · February 8, 2024, 11:38am

I suggest we need tests and/or examples that show the following for a register holding at least two independent variables:

writing and reading each variable multiple times, including one case of concurrent values with a merge
same, but interleaving the operations to show they are independent
accessing the history of each variable in turn, both last to first and first to last

dirvine · February 8, 2024, 12:13pm

100% Mark. Although I am not sure what you mean by two independent variables

Can you explain. (I see this is a key value store where keys are keys to any type of value)

happybeing · February 8, 2024, 12:47pm

I mean each has is own current value (or concurrent unmerged values) and it’s own history. So could be used to track multiple directory entries for example, with the ability to revisit the history of all variables (eg file/directory pointers).

With that example in mind I’d like to understand why you want to limit values to 32 bytes, and not say 1024 bytes which could then include metadata without an indirection. I can see pros and cons but don’t know enough to really think about them yet. I think I’d need to be able to think about the network as well as the programmatic side.

joshuef · February 8, 2024, 2:35pm

Was just chatting to @bochaco about this now, he brought up similar thoughts.

The bg here I think is forcing data-> chunks ensures encryption (via chunking). (unless I’m missing something else there @dirvine?)

dirvine · February 8, 2024, 2:48pm

If fields are large enough folk will pump actual data in there and the mess will begin IMO

If they are strictly pointers to chunks then it’s not a mess as folk may or may not read the chunks and so on. Plus, folk can have really complex and large values. No limit.

I think the app dev can decide how encrypted those chunks are, if they are files then s()he has the self_encrypt lib to make that easy, deduplicated and sharable. But if the dev comes up with some completely new value type, we don’t stand in their way.

happybeing · February 8, 2024, 3:30pm

I was imagining something more like network efficiency etc!

I’m not sure what that solves, because “the mess” just ends up one indirection away. Since how any values are stored is up to the developer, I don’t see the benefit here.

dirvine · February 8, 2024, 3:42pm

The benefit is clean registers, really. So they are CRDT and specific, built to hold keys but not values. If we start squeezing values in there, then I feel it gets messy?

If we say make it 1024 (say) then somebody comes along and says oh if it were only 1025 I could do X Y Z and then we explain they can using entries as pointers, we end up with a bit of a mixed message (plus a more complex data structure, vector intead of array or padding etc. happens) and more difficult to explain.

These are my personal thoughts BTW

Topic		Replies	Views
Safe Network storage features Features storage	38	996	December 25, 2023
Update 30 September, 2021 Updates	47	5167	December 7, 2021
Communication via Safe Network Development features	19	754	January 31, 2024
I need a pointer here please. I am working my way through the codebase and just need one thing Support	71	624	October 13, 2024
Transactions - a chained type to replace CRDT Registers Development api , registers , transactions	149	1308	December 9, 2024

Safe API - Registers

Related topics