[Pre-RFC] Labelled Data

joshuef · November 26, 2019, 4:40pm

Here we have an idea for improving data storage and access, moving away from the ‘container’ approach to something more flexible…

Right now it’s a semi-fleshed out idea. So please fire in all questions/thoughts/criticisms and hopefully we can work this towards a more complete RFC if that still makes sense!

(Also: excuse my handwriting!)

To clarify something about Folders: This proposal does not in any way change or remove the ability to make folder-like structures on the network. FilesContainers which we use to this end would still exist and could well be labelled. (clarified below the OP)

Flat Data Indexes and Labels

Summary

Remove containers as a concept and add labels, of which, any piece of data can have many.

This prevents data siloing in containers, without losing functionality.

It also gives more flexibility for querying/displaying data owned by an account.

This can be worked upon in a limited fashion now (in place of fleshing out container APIs). It will require tweaks to permissions, and has room for other enhancements down the line.

Motivation

Right now we have the idea that apps can have ‘containers’. An app can store what it likes in there, and another app has no idea of that data’s existence. This could lead to data-siloing even with the best intentions of RDF, etc.

Right now we need to implement data indexing on PUT for apps by default (this can be opt-out). But this way, whenever an app PUTs data to the network, your account has a record of this.

Proposal:

Remove ‘containers’ and use ‘labels’.
Any piece of data can have many labels.
Each label has its own index.
Apps can request permissions to work with specific labels.

This allows for a more flat and flexible data structure, without losing the ability for apps to organise their own data:

Labels such as folder or photo could be applied automatically. Labels for the app(<appId>) would also be applied automatically.

Other labels can be chosen/applied (me and awesome) above.

Thus an app can request permission to read/write data with the photo label, and even if it’s not the safe-cli app, which originally put the data, if it has the photo permission, then it can read the data (and indeed the whole Photos index).

Assumptions

This document doesn’t cover how data is represented on the network here. RDF is assumed later for describing the various labels, as is most likely MutableData as the index for a given label initially.

An index could well contain metadata for a file (e.g. the type, modification/creation date).

It also assumes the short term goal of a client-side implementation. Though this could (maybe should?) be handled network side down the line.

Automatic mapping is assumed for some labels (photo, app(<appId>), document). These automatic labels could well be modified per account.

Detailed Design

$ safe files put ShibeInJapan.jpg --label japan

File was uploaded to safe://gsda87632rgdsaihdaiuadis8adsada

The label “japan” was applied and “safe-cli” and “photo” were applied automatically.

After such a command, our account root could look like:

Data Storage Hooks

Upon any data PUT, there will be a hook in the relevant high-level API (that of safe-api to:

Determine the correct labels needed for this data (which could be image and photo for a .raw file, or mutable for a MutableData). This will always update or create the relevant Label Index.

Indices

Label indices can be readily implemented in the same fashion as we have Named Containers, i.e., MutableData stores of key: value fashion. Key being the name of the data (filename, a name given to data structs which don’t normally have them, or the XOR-URL). The value could be as simple as a XOR-URL, though more information may be of use there.

Combination of labels will initially be handled via concatenation (alphabetically) of the labels (e.g. apple/<appId>/food/fruit). (Though there is ample scope to improve this account side down the line)

The indexes only store a XOR-URL link to the relevant data, in a key-value fashion with a name being provided or derived from the data put.

Permissions

Permissions are managed on a label basis. An application will initially have permissions to access its own label.

An application with permissions to read the photos label can read data put by any application.

For example, our PhotosApp with permissions to access the Photos index, and its own app(PhotosApp) index could access the following indexes:

Whereas safe-cli with permissions to access the Me index and Folders, as well as its own app(safe-cli) index could access the following indexes:

Multiple labels

This proposal will involve a change to key retrieval in the client libs to enable accessing multiple-label indexes’ data. Having permission to read/decrypt a given label allows read/decrypt of any index containing that same label.

Each label/ label-combination will have its own access/encryption keys.
Multiple labels MUST be accessible by any application which has permission to access any one of the labels. I.e., an app can access data apple/<appId>/food/fruit if it has permissions to access apple. This does NOT imply that something with permission for fruit has access to apple however.

An initial thought on modifying labels. This can only be done by an application:

which is first creating data OR has permission to manage a label on that data
which has permissions for the label to be added

Data discovery

This use of indexes of XOR-URLs could actually allow another layer of permissions in which applications could request to discover data, i.e., read a certain index, but not necessarily read the data within it…

Implementation

An initial version of this could be developed using the same setup as Named Containers, using those same MDs/permissions for our Indices. Though extra changes will be needed to enable multiple-label key handling.

Synopsis

safe files put meInJapan.jpg --labels josh japan will automatically be added to images, josh and japan indexes as well as app:safe-cli and the multiple-index of app(safe-cli)/images/japan/josh will be used to store the keys for signing requests/encryption.

app:safe-cli/images/japan/josh : {
	meInJapan.jpg : <xor url>
}

An application wanting to access this data will simply

safe index get app:safe-cli meInJapan.jpg

Or alternatively

safe index get photos meInJapan.jpg

Questions

Are there limits on label characters? / length?
Other things?

Drawbacks

Marginally increases PUT cost, though this is necessary for most data, so should be priced in effectively. An opt-out will be available (perhaps requiring extra permissions?)
Needs for tweaks to permission setup

bochaco · November 26, 2019, 6:34pm

I’m not fully clear on this, so the labels have a hierarchy? if so how is that defined? from earlier above I got that those labels were simply concatenated alphabetically

Another aspect I haven’t thought of yet is how we can have URLs to access these labels, would the same as for named containers still work and coexist fine? e.g. safe:///japan/myFaceInJapan.jpg
Why I care about URLs? because I imagine linking these files from other places, either public or private

joshuef · November 26, 2019, 7:58pm

No hierarchy. Just trying to state that there should be some predefined order to label naming (if/while labels are effectively just named containers… this definitely can be improved upon.) Basically to avoid josh/maidsafe and maidsafe/josh being two different labels

I think so aye

marcin · November 27, 2019, 2:19pm

Nice proposal! I’m not sure I totally understand it, so I have a couple questions and suggestions. Also, I think it would be really nice to include a visual such as a diagram with this RFC. Our users have frequently mentioned that our technical documentation is hard to understand and that diagrams would be helpful. More examples would also be great.

The part where you mention a Photos label is confusing me because the rest of the documentation uses the example of a photo label. I’m also not exactly clear on what “auto managed” means.

I’m not sure if this is what you meant, but I would like the idea of having a mapping from the photo label to a Photos view which can be accessed by users. I think this mapping should be maintained on the application layer (the labels being managed client-side seems fine).

Also, this may be outside the scope of this document, but it would be neat if this mapping could be a user-configurable query, something like (in SQL terms):

WHERE label=photo

And we could have more complex mappings with booleans, other variables like last_modified_timestamp, file extension etc. But again, this seems better suited to the application layer (think Finder on OSX which has similar functionality).

safe index get photos myFaceInJapan.jpg

I am a little confused here. How does this file get associated with photos if it only has the josh and japan labels? Is it done by the hook, mentioned in the “Storage” section, which checks for .jpg files and auto-labels them? Also, did you mean photo or Photos instead of photos?

Should labels be pluralism? photo vs photos

I think photo makes more sense as a label on an individual piece of data and any view into this data could be Photos as it would be plural.

joshuef · November 27, 2019, 2:24pm

Yeh good points. I will get diagramming now.

Basically automatic labelling of data. We could have a map that says any file ending in .jpg gets a photos label. (And yeh, I need to be more consistent on that in the doc. I’m leaning towards pluralised labels as standard. Capitalisation is another not sure what’s best there).

I think any files app could easily set up some flexible organisation using labels, yeh. I’m not sure it’s needed in this layer.

Something I don’t touch in is just what metadata we’d put in an index, I guess modified time, extensions could easily go in there, eg.

See above re: extensions / auto labels (and my photos inconsistency

Thanks @marcin!

bochaco · November 27, 2019, 3:18pm

…or how about $ safe cat safe:///photos/myFaceInJapan.jpg instead? which makes me think we may need something else for the app ids labels, maybe app(safe-cli) so we can do $ safe cat safe:///app(safe-cli)/myFaceInJapan.jpg ?

Jean-Philippe · November 28, 2019, 4:56pm

Looks really interesting, i have limited knowledge in this area so my comments may be off.

Sounds like the difference between how Android and iOS feels. i.e in iOS each app have their own data, and it is a bit difficult to share things between apps. while on Android a lot of apps will be able to store things in same folder. Is there any implication in security, i.e does it allow encryption to be used in a way that apps without a label permission do not have the ability to decrypt its info?

What is the model to share data from an app to another? (add a new label with other app id?)

Should it be “labels”?

Other questions:

what/who can remove/add labels? (an app, the user only?)
Are label free form strings? is there limitations? would it impact performance (i.e someone use a 2MB label)?

joshuef · November 28, 2019, 5:29pm

I imagine right now we could achieve this with our current permission system for data. And each index has its own keys for encrypting and signing GETs.

If I want to modify/add a label to existing data I could do this. A new index could be created with the new label-combo, which would now own the data. As long as the original app’s label’s were still in tact, it would be able to access this new label-combo and the data too.

I think the user. Permission would be required for adding labels to data, I think (ie, this app needs that new label permission).

Good points. I’d imaging strings without much limitation (maybe some chars reserved for separators). Though size should probably be a limit indeed. I imagine we could naiively limit this for now and as things progress there will room to improve how this is handled.

Thanks @Jean-Philippe!

joshuef · November 28, 2019, 5:38pm

FYI OP has been updated with some diags / rewording. Hopefully clarifying some points

JimCollinson · November 29, 2019, 11:13am

I wonder if there’d need to be a difference between creating new labels and adding existing labels?

I guess this is where the discovery permission might be key. I’d want an app to be able to look through an index, and use my pre-existing label scheme when it’s saving its data. Even though it only ever has the permissions to read and write data labeled with its <appId>.

joshuef · November 29, 2019, 11:17am

I’d want an app to be able to look through an index, and use my pre-existing label scheme when it’s saving its data

Aye. This is where some RDF would come into play too. You should be able to grab a labels intent, regardless of the label name itself (Photo vs Foto eg).

JimCollinson · November 29, 2019, 11:43am

I think we could tie ourselves in knots a bit here with the plurals.

I reckon it best to think of labels enabling grouping of data (they really only become useful that way) and as such the user would predominantly be viewing a collection, tagged with that label. So probably go with the plural as the best guess, and then let the user customise as they see fit.

joshuef · December 2, 2019, 9:59am

OP has been updated w/ some more general info and a couple typo fixes.

krnelson · December 2, 2019, 10:14am

It sounds like a job for an extremely simple graph structure which will have more versatility when moving on to solving other problems in the future.

Here is a quick one page explanation of why it may be better suited data structure for this job than a roll-our own label db structure.

Now we can structure our data in whatever pattern we want without hitting complicated nesting issues, because we’re just keeping a reference to that object, not the object itself.

Here is a quick description of how you would use it which covers the example given in the pre-RFC.

Note I am not suggesting this particular library (there are a few rust libs that could serve or roll our own for the simple use case of labels), I just selected in because it had concise description/example.

joshuef · December 2, 2019, 10:50am

Thanks @krnelson,

There’s definitely plenty of scope for improving the index data structures. You’re right though, a simple graph may well be the answer there. (Such a thing built atop the key:value store of mutable data may be feasible eg).

Though that’s getting deeper into data structures than I attempt in the OP as right now I’m thinking in terms of usability (can this idea work for app devs? how?), and this is implementation details (which granted, need to be sorted out).

I’m trying/hoping to find an implementation which might be feasible to get going soon, using our current data structures on the network. Such as (if it’s deemed desirable) we could build this out in place of the container structs we’ve had so far (and have yet to build the API for). This way we’re not again changing some underlying app APIs in X amount of time.

That may or may not be possible with a graph in the near-term (if that’s the best struct for such indices), if not I’d hope we build this out in a way that the APIs hold, and the underlying index structs etc can be improved over time

oetyng · December 2, 2019, 11:04am

First, I generally think it’s a superb idea. Labels will give a lot more freedom in how we organize and access data.

Also, I’ve got a couple of initial questions.

With regards to viewing data in a folder hierarchy, how is this supposed to work?
Let’s use index and multi-index app(safe-cli)/images/japan/josh with meInJapan.jpg as example.

Will these labels correspond to folders, or would we explicitly state which label correspond to a folder?
Since the labels are concatenated alphabetically, how is the hierarchy determined?

Or… will we simply by default resolve all combinations and find meInJapan.jpg there?

root/images/japan/josh
root/images/josh/japan
root/josh/japan/images
root/josh/images/japan
root/japan/josh/images
root/japan/images/josh

I think that with the convention of having access to images giving access to any multi-index with images label in it, would indicate that all combinations automatically resolve, as that specific label is then always a top hierarchy folder.

JimCollinson · December 2, 2019, 12:17pm

The thinking here (from a UX POV) would be that a folder structure wouldn’t be automatically created/viewable from a set of labels, but would be created or determined by the user.

In my opinion, metaphors such as folders/containers, work around the premise of a piece of data being only in one location in that structure. So the folder is a way for the user to opt in to viewing and structuring their data in a deliberate way.

The labeling sits alongside all that, and allows a lot more flexibility.

There are times when you can mix these metaphors a little, e.g. a ‘Smart Folder’ which the user can use to curate data that stays in its original location, but is still within a virtual ‘container’ of sorts. Labels would enable all this too.

joshuef · December 2, 2019, 12:18pm

Viewing/managing data within a folder hierarchy is a separate construct to labelling data or indices. That would be a ‘folder’ in terms of the pseudo filesystem which is what in the APIs we’re calling a FilesContainer. This struct that allows for websites to have relative contents etc. This data can be managed and created outwith of any indexing. But the FilesContainer itself could be labelled (eg, with Folder )

The labels don’t correspond to folders in the above ‘FilesContainer’ sense. They are more indicative of the index in which a link can be found, and which is used as a method to determine permissions.

These are all the same label index. So only one of these would be valid, (the images/japan/josh going of labels in alphabetical order). And within that index, you’d be able to retrieve meInJapan.jpg

Does that help clarify @oetyng? Let me know if I misunderstood what you were asking

happybeing · December 2, 2019, 12:23pm

Interesting and coming late, nice diagrams!

Mainly just following but one question. I’m thinking of this as more a way of indexing data alongside use of the containers/content structures people are used to. Otherwise, where is labelling used instead, rather than alongside - any current/past application examples?

My experience is that people are willing to make reasonable use of organising into folders, not perfect, but people are used to this and do organise things like this, so I think containers/objects is useful.

Whereas where labels or tagging are used it tends to require more effort than most people can or are willing to put in. I recall Evernote had a good mix of tree and tagging, but it was a lot of work adding tags, and I never felt I got the value back from that so I’m dubious that:

users will label things (especially if containers are there)
that it is worth the work

Automated labelling might be good, but if this is just based on file extension, well we could just search for those anyway.

Another thought is that this is parallel to RDF semantic description, so the two would best be mirrors of each other or we end up with a mess and it might be hard to build UIs or apps that handle both without causing confusion to users and developers.

So I’m inclined to see this more as behind the scenes indexing rather than as a useful alternative to containers and/or semantic web. Which could be very useful! One of the issues with RDF as it sprawls about from one resource to the next is going to be how can that be explored, searched and accessed, and I think indexes like this would be useful. But in that case it would be derived from the RDF rather than explicitly. And any labels applied by apps/users would end up mirrored in the RDF, rather than only in label index itself.

Just first thoughts! It is an interesting idea. Now I’m wondering how this will look with/without containers and RDF from an API and a UI perspective. Seems that is almost more important to think about than how it would work under the hood.

JimCollinson · December 2, 2019, 12:33pm

Yes, the idea is that these would co-exist. One is not a replacement for the other. Folders and containers are a very useful metaphor that people are used to and we’re not trying to do away with that. But the user could choose to flip between various ways to view the structure of their data, depending on their needs.

Yes, labelling would predominantly be an automated, indexing layer, that would enable all sorts of UIs to be built on top. And it’s a way to stop the siloing of data that could make for a very clunky experience if we bake in a container-based structure, and fail to embrace the possibilities of flat structures + RDF goodness.

Topic		Replies	Views
[RFC] Labelled Data, Indexing and Token Authorisation RFCs	61	3535	January 29, 2020
Concepts for apps in a decentralised web Apps solid , projectsolid	30	2587	June 23, 2020
Apps impersonating other apps Development	16	1411	November 12, 2019
SAFE Network Dev Update - December 12, 2019 Updates	60	3273	January 9, 2020
App idea: Safe Layer. Let's talk about the data Apps	23	3411	March 27, 2018