I started writing a proposal to improve support for dynamic data in the SAFE network and I would like some opinions on my idea.
The basic idea is to link the structured data as a parent element and implement some methods to retrieve the child identifiers/XorName using the root identifier. Something like get_child_sd(root_identifier, offset, length) -> Vec<(XorName, PubKey)>
@dirvine propose a RFC to append the data in the main data structure, but I think this design has a limitationā¦ The appended data will always be transferred with the structured data. This can be potentially slow/inefficient.
What about using 2 SDās for each element. First is owned by the APP for the linking info and the other is owned by say the APP & user and contains the info.
This allows the structure to be flexible and defined by the APP and allows the user to modify the data. EG a database element.
Drawbacks is it is expensive requiring 2 SDās per element
Pros: it can be done using the current system and allows different apps (databases) to implement their own ālinkingā structure and not be tied to a ānetwork definedā linking
[quote=āneo, post:2, topic:6598, full:trueā]
What about using 2 SDās for each element. First is owned by the APP for the linking info and the other is owned by say the APP & user and contains the info.[/quote]
The problem though, is security. If the ālinkingā logic is in the client side (app or web)ā¦ It will be very easily compromised/faked.
We donāt need a new data structure to handle hierarchical data. Current SD can handle this by using a name composed of parent name concatenated with child number. For example, a simple tree of names could be:
Name
Name-0
Name-0-0
Name-0-1
Name-1
Name-1-0
Name-1-1
Like all SDs the name that must be passed to the create function is the hash of this name. The unhashed name has to be stored in the data part so that we can retrieve the parent and the children of each node.
For example: if the unhashed name of current node is āTopic1234-4-5ā then:
You hash āTopic1234ā to retrieve the root node
You hash āTopic1234-4ā to retrieve the parent node
To loop over the children, you increment a counter N and for each value you hash āTopic1234-4-5-Nā. You stop the loop when you get nothing.
To add new data the user must fetch ALL the data, just the get the number of elements in the nodeā¦ Even so, we are talking about a huge concurrent systemā¦ Imagine some system like āSAFEredditā, this link logic will be broken in seconds
Sorry to disagree with you, but number of elements cannot be stored in the parent node, because, the parent node is encrypted/signed by a user and the child node by another user.
The problem is not to store the data, is to retrieve the data in a robust way. For instance, if two users add the key in the same time ānode-0-1ā (node zero, child 1), what happens to the data? How can we retrieve all of them?
To do this the parent has to be writable by anyone creating a child - this is the problem @knuppe highlights as making the whole structure vulnerable.
One of the aims here is to allow multiple (untrusted) people to be able to append data (eg add comments to a blog post, or make replies to a comment) without them being able to break any other part of the structure (accidentally or deliberately) .
This is the key challenge of this new decentralised approach - and itās hard, so please keep trying!
You could then have the core network update the linkages SDs for the APP
Even if you have a core method of doing the linkage using whatever method, the APP still has to identify itself and prove its right to add/subtract to the linked structure. So if you use my initial idea, or a core implementation of any sort then it can be faked because the APP supplies the authentication and people could extract that from the APP, or fork the APP to do as they please.
What you really need is a method for an APP to authenticate itself without the authentication being revealed to the PC running the APP. Some sort of APP registration that the core network recognises and uses to lock/own the linkage SDs.
The APP developer would then use his keys to register the APP, and subsequent updates and the core network uses the APP registration to unlock the SDs. That way the keys to decode the SDs is never sent to the APP running or stored with the APP and the linkages SDs are owned by the developer whose keys were used to register the APP.
So as far as I can see, its not a special SD linkage you need but a method for the core network to register an APP and let it have keys that are never sent to/from the PC running the APP
In this case you donāt store the children count in the parent node and then parent node doesnāt need to be modified when a child is added. This was my initial proposal in my first post here.
Adding the children count was only for @knuppe who didnāt want to do this:
But it was clear that I prefer not storing the children count. This way there is no redudancy and the structure cannot be broken.
In my opinion, there is no point to create a APP registration. A well-designed APP, will store the data with some unique key (eg. XorName("{APP-GUID} + {content body} or {other 'unique' key logic}")), also the structured data has a tag property, that allows the APP developer define what ātypeā the data is. To the network, this object contents is just a hashed key with some encrypted value and the network donāt care about itās contentsā¦
With this proposal, we are just saying that this piece of data āAā is linked to piece of data āBā by some rules defined by the root element.
Letās suppose that we have the āSAFEredditā scenario, and we have a topic with 8894 comments:
Why we need to fetch all the 8894 comments to store a new comment?
How the APP knows, what are the child keys related to the topic (without a link system)?
How the APP shows the string ā8894 commentsā in the post?
Iām not sure it makes sense to add a Linked / Appendable type to routing given that there is a structured data type which is designed specifically for the purpose of defining custom data types. For example, the NFS example code uses a composite directory listing type which it defines in terms of the StructuredData type provided by routing, the same concept could be applied to lists.
The unified structure RFC also details that this is a way of creating data types which are secure by default in that they are verified at every modification to protect against various forms of attacks.
As for updating two structured data elements at the same time by two different users, I donāt see how this would be a problem given that only incremented versions can be āputā in the network, so two items with the same version cannot both be added, one will fail. I am of the opinion that the client should be responsible for ensuring that the version which is āputā gets updated appropriately rather than the routing. (as @tfa seems to be suggesting)
It wouldnāt be possible to fetch the 8894 comments unless they were in combination smaller than 100kb given the design of StructuredData dictates that this is the size limit. So you fetch the data item which points to the comments but never ask for the chunks unless you want to do that (e.g when you are listing the comments).
After closer inspection of the datamap data structure: http://maidsafe.net/safe_core/master/self_encryption/datamap/index.html, it seems that there is no concept of keys here so in reality you would have to fetch all of the chunks to reconstruct a keyed data structure. You could simply store the keys as a separate put if you didnāt want to reconstruct all the chunks to get to them.
Because he has to know the next child number to use. But the search can be optimized by not doing a linear search: the successive tries could be 0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 12288, 10240, 9216, 8704, 8960, 8832, 8911, 8886, 8898, 8892, 8895, 8893, 8894 (in bold the lookups that succeed)
Here he knows the next index to try is 8894. If the put fails (because another user added a comment in the meantime) then he tries 8895, 8896, ā¦ until it works.
There is a link system, but it is implicit with the name (stored in the data part): Dummy-3-2 is the third child of Dummy-3, which itself is the fourth child of Dummy.
Weāre going round in circles here. Obviously the above doesnāt scale, and your solution to that was to store the count, which doesnāt solve the problem of scale for the reasons @knuppe and I pointed out above.
We need something that scales, and works, unless weāre only talking about a small subset of applications.
@tfa, this lookup system will be possible but not more efficient, imagine hundreds of users using this lookup technique just to discover a simple/common information like ānumber of comments of a single postāā¦ This is just a very simple sample, imagine if each comment has more information like ānumber of likesā, ānumber of up/down votesā or the āreputation of the commenterā, etc.
This thing will consume the entire network just with lookups. I have the feeling that such logic to link the data should be in the core of the network, not in the application layer.
@et4te, about the datamap: In my thoughts yesterday, I designed a new map called LinkedData that has the root identifier and a vector of child elements + their public keys. When a āuserā adds a child element, it references the parent identifier in the SD, so then a network manager (or other authority), reads this parent identifier during the Put event and append the child identifier to the LinkedData object (here we can design a consensus system or algorithm to handle this).
I canāt shake the feeling that this would be a better fit with a memory based resource rather than disk. There I would imagine it making sense to have lists, tuple spaces and all kinds of nice data structures which naturally map to memory.
Persisting things in memory would be a really cool addition to safe_vault too and could form the beginning of a compute layer, where a ātransientā memory use combined with CPU could be used to compute arbitrary computations. I think an argument could be made to keep the disk primitives as simple as possible, since ultimately this will be the slowest form of persistence and would fit better with files.
Iām having a look at safe_vault at the moment in order to see how the chunk_store can be re-created for a memory persistence model. Then its a matter of weighting the value of memory relative to disk (clearly it is a more expensive resource).
As soon as the APP running on the PC uses its keys then the user can extract (hack) those keys for ownership of the SD data and scramble your linkage.
That is the inherent problem with an APP accessing its SD data for writing, it has to expose its ownership keys for the SD data in order to instruct the network to write/rewrite SD data.
And I was under the impression that this is the problem you were trying to fix by getting the core network to implement a linkage system. But the problem is that if your APP uses its keys while running on a users PC then a USER has the opportunity to extract those security keys to scramble your linkage/database or whatever you are building in SD data.
It does not matter what you put into the core network if the APP which runs on a users PC uses its security keys to ask the network to do that work. That is why if you create a core network system that registers APPs then the core network can do the (re)writes without requiring the keys to the ownership of the SD data elements be sent by the APP. Any forks/mods to the APP then is a new APP and thus has different registration.
I am assuming you understand that APPs do not run on a protected server (there are none), but on the users PC.