Here we have an idea for improving data storage and access, moving away from the ‘container’ approach to something more flexible…
Right now it’s a semi-fleshed out idea. So please fire in all questions/thoughts/criticisms and hopefully we can work this towards a more complete RFC if that still makes sense!
(Also: excuse my handwriting!)
To clarify something about Folders: This proposal does not in any way change or remove the ability to make folder-like structures on the network. FilesContainers
which we use to this end would still exist and could well be labelled. (clarified below the OP)
Flat Data Indexes and Labels
Summary
Remove containers
as a concept and add labels
, of which, any piece of data can have many.
This prevents data siloing in containers
, without losing functionality.
It also gives more flexibility for querying/displaying data owned by an account.
This can be worked upon in a limited fashion now (in place of fleshing out container APIs). It will require tweaks to permissions, and has room for other enhancements down the line.
Motivation
Right now we have the idea that apps can have ‘containers’. An app can store what it likes in there, and another app has no idea of that data’s existence. This could lead to data-siloing even with the best intentions of RDF, etc.
Right now we need to implement data indexing on PUT for apps by default (this can be opt-out). But this way, whenever an app PUTs data to the network, your account has a record of this.
Proposal:
- Remove ‘containers’ and use ‘labels’.
- Any piece of data can have many labels.
- Each label has its own index.
- Apps can request permissions to work with specific labels.
This allows for a more flat and flexible data structure, without losing the ability for apps to organise their own data:
Labels such as folder
or photo
could be applied automatically. Labels for the app(<appId>)
would also be applied automatically.
Other labels can be chosen/applied (me
and awesome
) above.
Thus an app can request permission to read/write data with the photo
label, and even if it’s not the safe-cli
app, which originally put the data, if it has the photo
permission, then it can read the data (and indeed the whole Photos
index).
Assumptions
This document doesn’t cover how data is represented on the network here. RDF is assumed later for describing the various labels, as is most likely MutableData
as the index for a given label initially.
An index could well contain metadata for a file (e.g. the type, modification/creation date).
It also assumes the short term goal of a client-side implementation. Though this could (maybe should?) be handled network side down the line.
Automatic mapping is assumed for some labels (photo
, app(<appId>)
, document
). These automatic labels could well be modified per account.
Detailed Design
$ safe files put ShibeInJapan.jpg --label japan
File was uploaded to safe://gsda87632rgdsaihdaiuadis8adsada
The label “japan” was applied and “safe-cli” and “photo” were applied automatically.
After such a command, our account root could look like:
Data Storage Hooks
Upon any data PUT, there will be a hook in the relevant high-level API (that of safe-api
to:
- Determine the correct
labels
needed for this data (which could beimage
andphoto
for a.raw
file, ormutable
for a MutableData). This will always update or create the relevantLabel Index
.
Indices
Label indices can be readily implemented in the same fashion as we have Named Containers
, i.e., MutableData
stores of key: value fashion. Key being the name of the data (filename, a name given to data structs which don’t normally have them, or the XOR-URL). The value could be as simple as a XOR-URL, though more information may be of use there.
Combination of labels will initially be handled via concatenation (alphabetically) of the labels (e.g. apple/<appId>/food/fruit
). (Though there is ample scope to improve this account side down the line)
The indexes only store a XOR-URL link to the relevant data, in a key-value fashion with a name
being provided or derived from the data put.
Permissions
Permissions are managed on a label
basis. An application will initially have permissions to access its own label.
An application with permissions to read the photos
label can read data put by any application.
For example, our PhotosApp
with permissions to access the Photos
index, and its own app(PhotosApp)
index could access the following indexes:
Whereas safe-cli
with permissions to access the Me
index and Folders
, as well as its own app(safe-cli)
index could access the following indexes:
Multiple labels
This proposal will involve a change to key retrieval in the client libs to enable accessing multiple-label indexes’ data. Having permission to read/decrypt a given label
allows read/decrypt of any index containing that same label.
- Each
label
/ label-combination will have its own access/encryption keys. - Multiple labels MUST be accessible by any application which has permission to access any one of the labels. I.e., an app can access data
apple/<appId>/food/fruit
if it has permissions to accessapple
. This does NOT imply that something with permission forfruit
has access toapple
however.
An initial thought on modifying labels. This can only be done by an application:
- which is first creating data OR has permission to manage a label on that data
- which has permissions for the label to be added
Data discovery
This use of indexes of XOR-URLs could actually allow another layer of permissions in which applications could request to discover
data, i.e., read a certain index, but not necessarily read the data within it…
Implementation
An initial version of this could be developed using the same setup as Named Containers
, using those same MDs/permissions for our Indices
. Though extra changes will be needed to enable multiple-label
key handling.
Synopsis
safe files put meInJapan.jpg --labels josh japan
will automatically be added to images
, josh
and japan
indexes as well as app:safe-cli
and the multiple-index
of app(safe-cli)/images/japan/josh
will be used to store the keys for signing requests/encryption.
app:safe-cli/images/japan/josh : {
meInJapan.jpg : <xor url>
}
An application wanting to access this data will simply
safe index get app:safe-cli meInJapan.jpg
Or alternatively
safe index get photos meInJapan.jpg
Questions
- Are there limits on label characters? / length?
- Other things?
Drawbacks
- Marginally increases PUT cost, though this is necessary for most data, so should be priced in effectively. An opt-out will be available (perhaps requiring extra permissions?)
- Needs for tweaks to permission setup