We had planned to go deeper into the governance issues we discussed last week but unfortunately haven’t been able to do so due to a couple of key members of the team needing to take unplanned time off. All being well, we’ll come back to this next week.
In the meantime, please read @jimcollinson’s post on our strategic aims, and be aware that our objectives and vision have not changed one bit. And, please remember to keep the discussions, however passionate and heated, always respectful. We have a forum code of conduct that all members are expected to uphold.
This week we’ll look at data on the network, and what it means for files to be public or private.
General progress
Yogesh has been looking at databases to replace sled db, which is buggy and doesn’t seem to be actively maintained. So far the prime candidates appear to be Persy, a transactional database that optimises for consistency, and Cacache which @yogesh says “seems to offer the best speed out of the lot with built-in metadata creation and handling”. Neither are perfect but both would probably do the job. Testing continues.
Thanks to @josh for organising the DBC comnet last week. As @Chriso mentioned, depositing owned DBCs isn’t working yet but this is what he’s been working on this week, and @Qi_ma is looking into a DBC reissue bug and also working on spentbook integration.
Meanwhile, @davidrusu continues to work on getting membership information to adults in order to ensure membership and network knowledge (via the signed Section Authority Provider) are in sync across the section.
Public and private data on the Safe Network
What is a file on Safe Network? Simple enough question but the answer is a bit more involved. The basic answer is “content + metadata + datamap” - but what does that mean?
Content
Content is the raw material of the file, the basic binary information. Once this gets to more than 1 MB it is automatically self-encrypted to produce chunks and a datamap. Because of the way self-encryption works, this is deterministic, i.e. self-encrypt the same content any number of times and you’ll get the same chunks. Its security is largely independent of the encryption algorithm (we use AES256) meaning that if the algo is cracked the chunks are still secure.
OK so what is a chunk? Unless you have the datamap, a chunk is a meaningless blob of bits, mostly around 1MB in size with a name that’s also its hash. This means we can check if it’s valid – does the name match the hash – but we can’t tell anything else about it. We can see it but we can’t read it, or know where it came from.
Datamap
Right, so what’s a datamap? The datamap is a simple file that contains the unencrypted name of the content and the names of all the encrypted chunks that make it up, so we know where to find them (chunk name == Xor address). If it’s stored unencrypted on the network then anyone can use it to recreate the content. If it’s encrypted or stored on our private client then only we can do that. We’ll come back to encrypting the datamap in a second.
Metadata
And the last thing we need to mention is the metadata, information about the content. This optionally includes its size, its name, the file type and potentially date created, accessed etc. But wait a second, Safe doesn’t do time! True, but that needn’t be a limitation.
The reason we don’t include metadata with the content is it would ruin deduplication. Let’s say someone uploaded the Sex Pistols song GodSaveTheQueen.mp3, and someone else uploaded exactly the same MP3 but called it GSTQ.mp3. If the name was part of the content the chunks would be completely different so there’d be no deduplication. This means we store the metadata separately from the chunks. We can store it in a datamap on the network or on our client, which allows us to arrange these apparently meaningless blobs to our hearts content, name and label them as we wish – including time created and time accessed – and organise them into our own directory structures.
Directories can also be content, encrypted, chunked and stored as files with their own data map (which is why small files which don’t go through self-encryption are unreadable – all content is stored in a directory, but that’s one for another day).
Public and private data
The way Safe works is that data that is valid must be stored. This means we can’t delete chunks. But remember files are content plus a datamap.
Content is just meaningless blobs without a datamap, and those blobs are as secure and unknowable as is possible with current technology. To make GodSaveTheQueen.mp3 publicly available we upload it, publish its datamap on the network unencrypted and link to it. Chances are, with a well-known song like that the chunks will already be there but the original uploader, who named it GSTQ.mp3 chose to encrypt the datamap or keep it on their client and therefore private.
So that is the basic difference between public and private data.
If we encrypt the data map with a BLS key, this also allows us to create key shares that we can then send to other people, meaning we have shared private data. BLS gives us this magic for free. This means public/private and shared data are all client-side actions. The network stores data forever and clients use the (root) data map and encryption to make data public, private or shared private.
Useful Links
Feel free to reply below with links to translations of this dev update and moderators will add them here:
Russian ; German ; Spanish ; French; Bulgarian
As an open source project, we’re always looking for feedback, comments and community contributions - so don’t be shy, join in and let’s create the Safe Network together!