Safe API - Registers

Would >32 bytes break the CRDT ability? Presumably not it they would have always been that so I’m still a bit skeptical the the benefits outway being able to store more than a pointer.

Maybe we should both take a look at what @bochaco is doing for directories before he gets too far!

2 Likes

No, but the more complex the type then yes it would have an impact for sure. Keys being unique are a very good thing. Even though a CRDT (And this does) would allow a key to appear more than once in the chain.

It’s purely simplicity verses complexity IMO

3 Likes

Relevant?

Scientists Find Optimal Balance of Data Storage and Time

February 8, 2024

Seventy years after the invention of a data structure called a hash table, theoreticians have found the most efficient possible configuration for it.

2 Likes

As my understanding of MerkleReg and the Register data type improves I’m seeing more reasons to provide for metadata (not just an xor address pointer) in Register Entries. cc @joshuef @bochaco - interested in your thoughts too, especially as I’m still learning about this and may make wrong assumptions etc

For example, if you want versioning with Registers pointed to by another Register I believe you will need some metadata to store the version hash as well as the pointer to another Register.

Without metadata in the Register Entry it will have to point to an immutable file which holds the metadata. That’s three extra GETS every time you use an Entry to access another Register. [edit: or one extra GET if the app bypasses self_encryption and writes a raw chunk - see David’s reply below]

That’s added complexity, plus a performance hit, which if people then need to add a cache because of that adds further complexity.

Same goes if you want to mix pointers to immutable data with pointers to Registers (to make a filesystem for example). You have to insert a layer between the Register Entry and the object, to hold metadata about the object or you won’t know what it is. So unless your Register only holds data of one type, you have to add an extra layer - one immutable file per Entry - to hold the metadata about the object you want to access via the Entry.

I still don’t see any reduction in complexity from keeping metadata out of Register Entries, but I do see a lot being added.

2 Likes

Trying to understand the existing API here is the only information I’ve found about traversing the history (this is in the Rust crdts crate used by sn_registers).

/// Retrieve a node in the Merkle DAG by it's hash.
///
/// Traverse the history of the register by pairing this method
/// with the children of the nodes retrieved in Content::nodes().
pub fn node(&self, hash: Hash) -> Option<&Node<T>> {
	self.dag.get(&hash).or_else(|| self.orphans.get(&hash))
}

Here: https://docs.rs/crdts/latest/src/crdts/merkle_reg.rs.html#117-123

This implies that to access the history the sn_registers types and types such as ClientRegister will need to expose the following, or provide a way to do this indirectly:

crdts::MerkleReg::read() -> Content<T>
crdts::MerkleReg::node(&self, hash: Hash) -> Option<&Node<T>>

These are not exposed by any of the safe types including sn_register::RegisterCrdt (which wraps the MerkleReg) upwards. So this needs supporting in some up through following types, each of which wraps the one immediately above it:

sn_registers::RegisterCrdt
sn_registers::Register
sn_registers::SignedRegister
sn_client::register::ClientRegister 

The simplest change will be to expose &sn_registers::RegisterCrdt::data (i.e. the MerkleReg<Entry>) all the way up to ClientRegister to allow access to all the low level APIs (including the ones needed for history).

It might be preferable to provide more helpful accessors (named for each SN struct) but I’m not competent enough to design those right now. So I’m wondering if you’d accept the former as a PR once I have a working example that allows interrogation of a ClientRegister history? Or by all means tell me exactly what you’d like @dirvine @joshuef @bochaco

Or if I’m missing something here :grimacing: by all means point it out!

In the mean time I’ll work on a fork to get access to the MerkleReg and create an example which shows the history of a live register, including the one from the example in the SN README.

2 Likes

Here is my thought on this.

You have an app or standard. That app or standard will define how to parse the value of the register key. It can expect ANY type is possible, no imitations.

So the app knows the value will be of Type X or even X or Y and parse those values correctly. These values may contain metadata plus actual data. The could be files, chunks, more registers, map info, SOLID stuff etc. We don’t care, the app defines everything.

It only needs to point to a Chunk, not a file. That chunk contains whatever the app designer wishes.

1 Like

I understand that and I’m not saying you can’t do any of what you describe. My point is that it is inefficient and more complex, and conversely that for many use cases providing space for metadata in an Entry will be more performant for apps and less complex in code.

Ok, so one additional GET rather than three :white_check_mark:

Though encouraging that instead of going via self_encryption is likely to lead to a lot of unencrypted app data littering nodes :scream:

What I am saying is the key should be a key. Like any key/value store the key is a straight key and no more. The value can be anything. I think if we have the key being more than a key, then it is surprising to devs (bad API design).

This can happen today. Chunks are purely hash(content) == name. Clients using the file API will automatically self_encrypt, but they are not limited by that. They can create chunks of any type. It’s one of the reasons we have been playing with encrypt at the node.

3 Likes

I don’t see this. To me a Register isn’t a key value store but a versioned set of values (of type Entry) where it is up to the app to decide whether the value is a some data, or a key, or metadata and a key etc.

Understood. My point is that adding a layer of indirection (in the form of chunks) could make it more likely apps will fail to encrypt. There will be ways to encourage encryption (e.g. providing easy APIs that roll encryption into say, writing a Register Entry) or which discourage encryption. I think there’s a risk this discourages it but maybe not. It depends what the overall APIs and docs look like but at the moment I can’t say for sure. :man_shrugging:

3 Likes

That is my thought too. Additionally what if another App inserts an entry where the standards say it shouldn’t then its messed up and since its there for a long time …

So if the metadata is a code then how is it defined? If its app defined what is to stop another app from using it but for something different? Are we back to having no real standard. My opinion is that its still better to have it. Kind of what to expect when you retrieve the chunk it points to, or what the key data is for.

1 Like

Pushing the metadata from the Register to a chunk doesn’t help IMO, it complicates.

Whatever we do, developers can choose to do what they want. What’s important is to provide APIs that are understandable and effective in use, make good practice easy and maximise efficiency.

David says keeping metadata out of the Register Entry reduces complexity but I don’t see it.

2 Likes

For instance to scan the register to find the entry you want goes from retrieving the one register “chunk” from the network to retrieving a chunk for each register entry. This creates work for the network that is solved by having the “purpose” (metadata) with the key. Basically this is what the “magic chars” was doing in Unix so that an App can be chosen based on the first record fetched for the “text” file. Or the extension on a filename, or so many other example in computers, or products you buy, or book names you buy. Its there to save time and guessing and having to scan the data to determine what to do with it.

3 Likes

What structure do you wish the key could be. It’s no longer a key, but a key plus some stuff. So what size should that be and why? Also consider what it points to? Does the app say oh this metadata means you parse X from the chunk or …

2 Likes

I’ve been wondering here if the Key as you say @dirvine might need to be just a NetworkAddress with any/all that entails. Which I realise is largely what you’re describing but in byte form… but here I’m wondering if a NetworkAddress::Register should have a portion to describe the entry hash too.

That would keep things strict/simple in reg API terms, but allow for the situation @happybeing is referencing w/r/t reg versions eg.

just a thought…

6 Likes

Can you explain how letting the app dev decide what is stored in an Entry (larger than a key) makes the Register API more complicated. And why adding an extra layer of indirection to hold any data about the subject of the Entry is not more complicated.

I can see Devs ignoring Registers with this approach and instead storing everything in chunks add missing the whole versioned perpetual data opportunity unless they can store useful data other than a pointer in an Entry because they will find it easier to think about and far more efficient to have a simple index in a single conceptual object (a Register). Devs who want to treat an Entry as a key can still do so.

Entries have been 1k for years for a reason. Suddenly they’re not, and the reasons for this change are not clear to me.

As @neo points out, the app must now load every Entry’s chunk before it can search the index of values. For a 32 byte key and a half full register that means (1MB / 32) * .5 ~= 16,000 additional GETS.

That’s reduced if use more than 32 bytes for an Entry to accommodate the version hash into the Reg address. But you seeing the need to do that makes my point because you’re adding metadata to the address and enlarging the Entry to hold it.

Tricky, not tricky! Well some metadata is better than none. How much is ‘ideal’ will depend on the application but I’d suggest looking at a filesystem Entry as both a very useful application that it would be good to support, and something that uses a relatively large amount of metadata compared to most. If you can accommodate that I expect many or perhaps most apps would be covered. I haven’t looked carefully at the filesystem yet but have thought 1KB might be good for that. And a thousand Entries per Register seems a good balance. But as I say, any app specific metadata will IMO be much better than none.

As for structure, I’d leave that to the app. Why force it when you don’t know what the app developer will want or the best way to store that? So just bytes as now. You call it a key but I call it a value, which if the developer wants - can include a key. If an Entry was 1KB many apps could just use the register and not need it to point to chunks. They then get versioned data without any requirement to choose to use it or figure out how to make it work. That seems much simpler.

You could, if you want to force the dev to consider use of a key, reserve space for that, or provide an API that reserves a block of bytes for it. Then have the remainder available for app specific use.

I still don’t feel like I understand the reasons for wanting to force an Entry to just be a key. Saying it is simpler doesn’t seem true from the developer’s viewpoint, except perhaps for a design pattern I’m not that familiar with. So maybe you can elaborate on the reasons this is less complex, and justifies the downside I’m seeing for what I think of as typical app use cases.

5 Likes

I’m not sure it does, until we have to justify a size as @dirvine notes. So at some point what we have is insuffcient for someone. I can buy the argument to keep it simple here and register are just pointers to other network data. With a suitable API that’s not necessarilly complex…

It also removes quesstions of register stuffing and pricing around that, which could be some complexity there.

What it does is forces use of other data types… again, not complex, but more calls, sure. All of this exists in the network now, and APIs for registers could still be store(bytes:Vec<u8>), with all underlying layers hidden…

So is there a potential network call there to get your data. Yes. Is that likely to be a big deal?.. I’m honestly not sure, but tending towards “not really”.

So it seems to me this is presuming bad register APIs. There’s no reason the API can’t staay as is, but content is shepherded by the API layer… So I don’t buy this argument myself here.

I don’t think it must. Only as/when you want that data. If we have linked registers… must we load all entries at once? Why?

This seems like app layer logic.

Indeed, but in a clearly scoped context of the NetworkAddress type. It’s not arbitrary metadata, it’s something relating to a concrete network type (the register).


If we suppose that registers can hold some amount of bytes. What is the correct amount? And how do we arrive at that?

Registers of size Y > X would be more useful that registers of X… So how to draw the line?


Honestly, I’m not sure of the right path here, but APIs that hide any complexity here seem straight forward… And it neatly solves the correct amount of bytes question, it seems.

The main drawback seem to be spreading data over more nodes here and any lag that might entail? (but that’s only for small byte data of size < X … )

1 Like

Maybe one solution that solves both the divine end of only a (32 byte?) key and the happy end of 1K is to have the API such that it can be asked to combine each entry into larger values. Thus something like 1, 2, 4, 8, 16, 32 keys as one value. Then the App with one call get the size of value that most suits and not worry about boundaries. IE if wanting 8 32 byte entries then the register is treated as 256 byte values. That way there is no worrying about how the register is split up.

1 Like

I think we disagree on these points.

I don’t see how you can say it isn’t more complex to force an indirection. Why then has @bochaco avoided that in the first cut of the folders implementation? I think it’s evidently simpler to think about and implement.

We can’t choose the perfect amount of metadata for every app. But we can cover many and maybe most, and gain a lot for every single one of those without blocking use of indirection for those who want or need it. So for me that is not a strong reason to make life and performance much worse for so many.

I don’t see why restricting Entries as described makes anything significantly easier compared to the other consequences.

Some apps won’t need to load the whole index sure, but many will so again I don’t accept that all those extra GETS will not be an issue.

Putting those issues together, on top of learning about Registers and versioning (and CRDTs if needed) I see Devs shying away from using Registers which is not good for the goal of perpetual versioned data.

Registers and versioning will be a leap for many. (CRDTs another big leap if needed.) It’s taken me a while to get what I think is a basic understanding, and then I find I’m going to have to manage indirection to get the metadata for the things my app needs to manage. So then I’m wondering how to build my app logic on these unusual types, then if it will be performant to use them at all.

At the very least I’m going to have to test that out so I can design with some confidence that I can build a useful app using Registers this way.

I can only speak for myself, but I don’t think I’m atypical. For me it is an issue I could do without and I know that if I wasn’t committed to the idea of perpetual versioned data I’d probably turn away and implement something simpler for my first Safe app.

3 Likes

You may have something there @neo. :clap:

Now back to code!

Hmmm, this presupposes an aim for the PR which wasn’t there… It was never discussed to be changing register innards here. That’s a separate concern somewhat? (and as yet not a wholly decided one…)

Again, assuming the API is not actually changed here… there’s not necessarily a difference in what a dev needs to consider… And I think that could be achieved…

What is this optimum size though? How do we decide it?

How many entries should a register have?

At the moment, this has all been arbitrarily decided, so it’s good to try and dig in here and see what the answers to these questions might be…

(And again, I’m not convinced either way here, but say we’re keeping registers as arbitrary byte storage, there needs to be some more reasoning on this end)

You’re talking about the extra GETs here? Or imagining register APIs being more complex?