Discussion topic for RFC 55.
I think unpublished immutable data is a fantastic compromise between delete/no-delete.
The Network SHALL enforce that the GETs are only allowed by the owner(s). For this we SHALL use the special OwnerGet RPC
Where on the route does this validation happen?
Seems like itās covered in RFC-0054. MaidManagers do the validation within this route:
Client <-> MaidManagers <-> DataManagers
Might be worth including in this rfc a reference to the RFC-0054 Unresolved Questions since the issue relates equally to both.
I thought immutable data was not meant to have owner data.
ImmutableData will only have an owner if itās Unpublished.
From the RFC
The published
ImmutableData
is the normalImmutableData
we have just now. There are no changes to that.
That introduces all the issues of identifying the owner doesnāt it. What if I find the first chunk and inquire the owner? Just search addresses for the chunk.
What about Dedup @maidsafe. I upload privately and then someone else uploads publically then its not the same is it and I become discoverable since the owner info is still there
I guess I better read the RFC
In these cases the validation is at the elder group of the holding section (datamanagers), the reason being the request goes all the way through signed. So on receipt the sig is checked, if that is the owner then the reply is sent to that owner ID. If we validate earlier then we could have replay attacks or weird things like the signed Get
request being handed out to folk. So the Get is signed by the owner and delivered back to that owner, if that makes sense?
An aside
Something we need to get clear is un-published != totally private as vaults holding the data can read it etc. so it really means only owners can get it. So effectively private (you can encrypt etc. on top of this though). similar with identities (public keys), they are all created anonymously and if folk share the id then they are no longer anonymous. so there is confusion about public/private/anonymous/publishable/nonpublishable
You are pretty good at disambiguation Ian so as we go through RFCs any help there would be magnificent as I know I had issues trying to convey this originally and we have had things like publicId and privateId in code that did lead to confusion. The other confusion is accounts verses id, accounts can hold secretkeys etc. but idās are public keys and you can have loads of them, whether they re public or not is the users decision. This part is also confusing for many I think.
Unpublished data will not have de-duplication. The address of the chunks are calculated as sha3_hash(hash_of_data + owner)
. So no two similar pieces of data will have chunks in the same location. Unless ofc, the same person uploads the same data twice as unpublished immutable data. In this case a conflict error will be returned.
This sounds like some data held by vaults is not encrypted - or do you mean something else?
There will be an RFC for obfuscation at vaults. Also if you use self-encrypt it is all good, but people could upload owned immutable data that is not encrypted if you see what I mean?
Hmm, this is news to me and Iāve been saying everything is encrypted by default for a long time now. See also (emphasis mine):
Fundamental #19: The SAFE Network will only ever allow encrypted traffic and encrypted services.
Put simply, everything, including web traffic is encrypted by default. Everything. This is non-negotiable for a Network that demands privacy for every one of its users. You can of course choose to make information public ā but this has to be your choice alone. So this means you can be safe in the knowledge your data will always be secure
I think itās a fundamental [cough] problem if unencrypted information can end up on vaults unless I write code do extra stuff to ensure that doesnāt happen, and is not what people will expect.
Personally I donāt think unencrypted data should ever end up on a vault unless it is explicitly public/published.
My understanding is that non public, non published info can end up exposed on a vault unless the developer explicit adds their own encryption layer. Thatās non trivial (except for the MD case) - not that hard, just outside most Dev experience, so a learning hurdle. So in many cases I expect it wonāt happen, and if Iām representative, many devs wonāt even realise itās necessary (and will end up misleading users).
That is still true. It has to be uploaded without the (default) automatic self encryption being done
Ah OK, I misunderstood that.
But Iām not clear now if this is what David means, so I hope for confirmation wrt fundamental #19. Have to say Iām surprised this didnāt make it into the top five! Seems an important point to get across to those who will read the Fundamentals.
You will see an RFC next week or the week after on this, well obfuscation. Vaults will not be able to decrypt even clear text that is uploaded a elders will encrypt it. We strongly advise against anyone uploading unencrypted data by bypassing the APIs but we cannot 100% stop a bad app doing that. We can ensure Adults (vaults) hold no unencrypted data though.
Thanks David. If using the APIs ensures an app conforms to Fundamental #19 Iām happy.
I assume by bypassing the API you mean using some of the lower level APIs, in which case I suggest the documentation for those specific API calls each make it clear that they do not encrypt by default, and point to what needs to be done to ensure encryption (a ref link to an explainer should do).
Yes we will doc these and hope any app doing it would be classed bad. The issue the RFC will prevent though is a bad actor using those APIs to upload bad stuff to vaults on purpose to try and break the network.
Iāve read RFC 54 and 55 a couple of times and also looked over a fair amount of Maidsafe code on Github. I figured Iād share some thoughts / feedback.
It sounds like elders have to get involved every time a user wants to PUT deletable data (āUnpublished ImmutableDataā) on the network. That seems to imply:
- Assuming a lot of people want to use this feature (i.e. for backups), the network could require a high ratio of elder nodes to vault nodes.
- The network could have scalability problems with elders having to receive, encrypt, and broker messages for so many chunks.
Iām hoping next weekās RFC will shed some light when it comes out. On a related note, some of what I donāt yet understand is:
- How come unpublished ImmutableData can be unencrypted but published ImmutableData cannot?
- Why should network access (GET operations) be restricted? If the chunk was encrypted by the elder, it can only be read by the owner anyway, right? My assumption is that the elder uses the ownerās public key to encrypt the chunk, e.g. using an operation such as Sodiumās crypto_box_seal().
I had thought the same thing when I read the RFC yesterday and became concerned about the concept of controlling GET access at the network level.
I saw a similar point/question made in RFC54 - adversaries could collude outside of the SAFE network to share chunks stored in their vaults. I feel that part wasnāt fully addressed by @dirvineās response. On the other hand, the question assumes that the data stored in the vaults is potentially unencrypted and readable by the vaults (the upcoming RFC is supposed to make vaults unable to read their stored unpublished chunks).
The following comment was helpful for me to better understand the motivation behind this RFC:
If all this is largely just to guard against vaults storing risky unencrypted data (Iām guessing illegal content uploaded by adversaries), it seems to me that even a vault could simply encrypt the chunk with the ownerās public key prior to saving to disk (i.e. not require an elderās resources). The owner would have to undo another layer of encryption once they GET their data back, but I donāt think it should be a blocker (if anything, I think it would be easier and less overhead than developing an Owner-Get messaging protocol).
This could be developed to be a deterministic process (e.g. the owner generates a second keypair and includes the generated āprivateā key as part of the chunk). Multiple vaults could have the identical chunk saved (and know itās valid). The ownerās software would be able to calculate the encrypted result stored on a vaultās drive and send a signed hash. This would enable the vault to replicate data to other vaults (along with proof from the owner that the encrypted file is correct). Also, in the future, a proof of storage feature could conceivably make use of this.
Obfuscation is not enough. Vault owners need all data to be encrypted to avoid liability for storing (fragments) of data that any government might consider illegal.
Elders donāt encrypt or break chunks. These tasks are performed on the client side.
The main differences with the current Immutable data are:
.-The header contains the set of owners and the reading and deletion of these data will be restricted to them.
.-Data will not be deduplicated. The address of the chunks are calculated as sha3_hash(hash_of_data + owner) instead of sha3_hash(hash_of_data).
From what I deduce, the Unpublished InmmutableData not only is non-transferable, but not sharable with someone who doesnāt own them. Neither can you remove or add an owner.
What is the benefit of using such a restrictive data type?
Wouldnāt it be more useful to handle reading and deleting differently and allow read sharing? (Could use, for example, a nonce instead the owner to calculate the address).
Does deletion require everyoneās signatures or only a majority?
As usual, the limit of 1MB per chunk is maintained and the client will manage it. Right?
Hmhmm - if the owner is an alias it is transferable and sharable to a selected audience
Just removing owner is still not possible - but its immutable so not really surprising then maybe