Pre-RFC: Linkable Data Structure

I started writing a proposal to improve support for dynamic data in the SAFE network and I would like some opinions on my idea. :wink:

The basic idea is to link the structured data as a parent element and implement some methods to retrieve the child identifiers/XorName using the root identifier. Something like get_child_sd(root_identifier, offset, length) -> Vec<(XorName, PubKey)>

@dirvine propose a RFC to append the data in the main data structure, but I think this design has a limitationā€¦ The appended data will always be transferred with the structured data. This can be potentially slow/inefficient.

Here is a early^2 draft of the RFC.

Thanks in advance (=

2 Likes

What about using 2 SDā€™s for each element. First is owned by the APP for the linking info and the other is owned by say the APP & user and contains the info.

This allows the structure to be flexible and defined by the APP and allows the user to modify the data. EG a database element.

Drawbacks is it is expensive requiring 2 SDā€™s per element

Pros: it can be done using the current system and allows different apps (databases) to implement their own ā€œlinkingā€ structure and not be tied to a ā€œnetwork definedā€ linking

1 Like

[quote=ā€œneo, post:2, topic:6598, full:trueā€]
What about using 2 SDā€™s for each element. First is owned by the APP for the linking info and the other is owned by say the APP & user and contains the info.[/quote]
The problem though, is security. If the ā€œlinkingā€ logic is in the client side (app or web)ā€¦ It will be very easily compromised/faked.

We donā€™t need a new data structure to handle hierarchical data. Current SD can handle this by using a name composed of parent name concatenated with child number. For example, a simple tree of names could be:

  • Name
    • Name-0
      • Name-0-0
      • Name-0-1
    • Name-1
      • Name-1-0
      • Name-1-1

Like all SDs the name that must be passed to the create function is the hash of this name. The unhashed name has to be stored in the data part so that we can retrieve the parent and the children of each node.

For example: if the unhashed name of current node is ā€œTopic1234-4-5ā€ then:

  • You hash ā€œTopic1234ā€ to retrieve the root node
  • You hash ā€œTopic1234-4ā€ to retrieve the parent node
  • To loop over the children, you increment a counter N and for each value you hash ā€œTopic1234-4-5-Nā€. You stop the loop when you get nothing.
3 Likes

To add new data the user must fetch ALL the data, just the get the number of elements in the nodeā€¦ Even so, we are talking about a huge concurrent systemā€¦ Imagine some system like ā€œSAFEredditā€, this link logic will be broken in seconds

Here is some design concepts/ideas.

Root Element

  • Iā€™m the root element, and I allow other ā€œusersā€ to relate data to me [here is blacklist/whitelist - optional].
  • The maximum of child levels is one. [optional]
  • The users can link only one reply/data to me.[optional]
  • The users canā€™t update/delete the linked data.[optional]

Child Element 1

  • Iā€™m linked to the root
  • The network can store me in the linking data because my signature is valid, Iā€™m aware (and respect) the rules from root.

Child Element 2

  • Iā€™m linked to the child element 1 (CE1)
  • I canā€™t be linked by the network to the element because the root element does not allow two levels of data.

ā€¦

You can store the number of children in the parent node if thatā€™s you want. I didnā€™t do it because:

  • most of the times you only need to loop over the children
  • adding a child doesnā€™t modify the parent

But if you need to get the number of children and a node can have many children then just add it in the data part.

Sorry to disagree with you, but number of elements cannot be stored in the parent node, because, the parent node is encrypted/signed by a user and the child node by another user.

The problem is not to store the data, is to retrieve the data in a robust way. For instance, if two users add the key in the same time ā€œnode-0-1ā€ (node zero, child 1), what happens to the data? How can we retrieve all of them?

To do this the parent has to be writable by anyone creating a child - this is the problem @knuppe highlights as making the whole structure vulnerable.

One of the aims here is to allow multiple (untrusted) people to be able to append data (eg add comments to a blog post, or make replies to a comment) without them being able to break any other part of the structure (accidentally or deliberately) .

This is the key challenge of this new decentralised approach - and itā€™s hard, so please keep trying!

2 Likes

You could then have the core network update the linkages SDs for the APP

Even if you have a core method of doing the linkage using whatever method, the APP still has to identify itself and prove its right to add/subtract to the linked structure. So if you use my initial idea, or a core implementation of any sort then it can be faked because the APP supplies the authentication and people could extract that from the APP, or fork the APP to do as they please.

What you really need is a method for an APP to authenticate itself without the authentication being revealed to the PC running the APP. Some sort of APP registration that the core network recognises and uses to lock/own the linkage SDs.

The APP developer would then use his keys to register the APP, and subsequent updates and the core network uses the APP registration to unlock the SDs. That way the keys to decode the SDs is never sent to the APP running or stored with the APP and the linkages SDs are owned by the developer whose keys were used to register the APP.

So as far as I can see, its not a special SD linkage you need but a method for the core network to register an APP and let it have keys that are never sent to/from the PC running the APP

In this case you donā€™t store the children count in the parent node and then parent node doesnā€™t need to be modified when a child is added. This was my initial proposal in my first post here.

Adding the children count was only for @knuppe who didnā€™t want to do this:

But it was clear that I prefer not storing the children count. This way there is no redudancy and the structure cannot be broken.

In my opinion, there is no point to create a APP registration. A well-designed APP, will store the data with some unique key (eg. XorName("{APP-GUID} + {content body} or {other 'unique' key logic}")), also the structured data has a tag property, that allows the APP developer define what ā€œtypeā€ the data is. To the network, this object contents is just a hashed key with some encrypted value and the network donā€™t care about itā€™s contentsā€¦

With this proposal, we are just saying that this piece of data ā€œAā€ is linked to piece of data ā€œBā€ by some rules defined by the root element.

Letā€™s suppose that we have the ā€œSAFEredditā€ scenario, and we have a topic with 8894 comments:

  1. Why we need to fetch all the 8894 comments to store a new comment?
  2. How the APP knows, what are the child keys related to the topic (without a link system)?
  3. How the APP shows the string ā€œ8894 commentsā€ in the post?

The first user will win. The second user will have to try ā€œnode-0-2ā€, and then ā€œnode-0-3ā€ ā€¦ until the put succeeds.

As I said in my first post to retrieve all the children of a node you loop over them by incrementing a counter.

To get a complete tree, you do this recursively from the root node or any other intermediate node depending on what your are looking for.

Iā€™m not sure it makes sense to add a Linked / Appendable type to routing given that there is a structured data type which is designed specifically for the purpose of defining custom data types. For example, the NFS example code uses a composite directory listing type which it defines in terms of the StructuredData type provided by routing, the same concept could be applied to lists.

The unified structure RFC also details that this is a way of creating data types which are secure by default in that they are verified at every modification to protect against various forms of attacks.

As for updating two structured data elements at the same time by two different users, I donā€™t see how this would be a problem given that only incremented versions can be ā€˜putā€™ in the network, so two items with the same version cannot both be added, one will fail. I am of the opinion that the client should be responsible for ensuring that the version which is ā€˜putā€™ gets updated appropriately rather than the routing. (as @tfa seems to be suggesting)

1 Like

@knuppe In your SAFEreddit example,

  1. It wouldnā€™t be possible to fetch the 8894 comments unless they were in combination smaller than 100kb given the design of StructuredData dictates that this is the size limit. So you fetch the data item which points to the comments but never ask for the chunks unless you want to do that (e.g when you are listing the comments).
  2. After closer inspection of the datamap data structure: http://maidsafe.net/safe_core/master/self_encryption/datamap/index.html, it seems that there is no concept of keys here so in reality you would have to fetch all of the chunks to reconstruct a keyed data structure. You could simply store the keys as a separate put if you didnā€™t want to reconstruct all the chunks to get to them.

Because he has to know the next child number to use. But the search can be optimized by not doing a linear search: the successive tries could be 0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 12288, 10240, 9216, 8704, 8960, 8832, 8911, 8886, 8898, 8892, 8895, 8893, 8894 (in bold the lookups that succeed)

Here he knows the next index to try is 8894. If the put fails (because another user added a comment in the meantime) then he tries 8895, 8896, ā€¦ until it works.

There is a link system, but it is implicit with the name (stored in the data part): Dummy-3-2 is the third child of Dummy-3, which itself is the fourth child of Dummy.

Weā€™re going round in circles here. Obviously the above doesnā€™t scale, and your solution to that was to store the count, which doesnā€™t solve the problem of scale for the reasons @knuppe and I pointed out above.

We need something that scales, and works, unless weā€™re only talking about a small subset of applications.

@tfa, this lookup system will be possible but not more efficient, imagine hundreds of users using this lookup technique just to discover a simple/common information like ā€œnumber of comments of a single postā€ā€¦ This is just a very simple sample, imagine if each comment has more information like ā€œnumber of likesā€, ā€œnumber of up/down votesā€ or the ā€œreputation of the commenterā€, etc.

This thing will consume the entire network just with lookups. I have the feeling that such logic to link the data should be in the core of the network, not in the application layer.

@et4te, about the datamap: In my thoughts yesterday, I designed a new map called LinkedData that has the root identifier and a vector of child elements + their public keys. When a ā€œuserā€ adds a child element, it references the parent identifier in the SD, so then a network manager (or other authority), reads this parent identifier during the Put event and append the child identifier to the LinkedData object (here we can design a consensus system or algorithm to handle this).

@happybeing, perfect you got the idea! :+1:

2 Likes

I canā€™t shake the feeling that this would be a better fit with a memory based resource rather than disk. There I would imagine it making sense to have lists, tuple spaces and all kinds of nice data structures which naturally map to memory.

Persisting things in memory would be a really cool addition to safe_vault too and could form the beginning of a compute layer, where a ā€˜transientā€™ memory use combined with CPU could be used to compute arbitrary computations. I think an argument could be made to keep the disk primitives as simple as possible, since ultimately this will be the slowest form of persistence and would fit better with files.

Iā€™m having a look at safe_vault at the moment in order to see how the chunk_store can be re-created for a memory persistence model. Then its a matter of weighting the value of memory relative to disk (clearly it is a more expensive resource).

1 Like

As soon as the APP running on the PC uses its keys then the user can extract (hack) those keys for ownership of the SD data and scramble your linkage.

That is the inherent problem with an APP accessing its SD data for writing, it has to expose its ownership keys for the SD data in order to instruct the network to write/rewrite SD data.

And I was under the impression that this is the problem you were trying to fix by getting the core network to implement a linkage system. But the problem is that if your APP uses its keys while running on a users PC then a USER has the opportunity to extract those security keys to scramble your linkage/database or whatever you are building in SD data.

It does not matter what you put into the core network if the APP which runs on a users PC uses its security keys to ask the network to do that work. That is why if you create a core network system that registers APPs then the core network can do the (re)writes without requiring the keys to the ownership of the SD data elements be sent by the APP. Any forks/mods to the APP then is a new APP and thus has different registration.

I am assuming you understand that APPs do not run on a protected server (there are none), but on the users PC.

2 Likes