Self-encryption - design decisions

happybeing · October 15, 2024, 2:55pm

Are the design decisions for self-encryption documented anywhere?

I’ve not searched the code or RFCs yet but am curious why even small files are now at least three chunks in size.

I remember that early on if a file was small enough it was stored directly in the data map which still seems sensible but was abandoned at some point.

Does anyone recall why? @dirvine @joshuef

It would be desirable to do this especially if chunk sizes are to be increased. But even at 1MB so many files would fit in a single chunk I assume there must be a very good reason for not doing this.

I just can’t think of one!

Most websites, most user created documents etc would fit in a single chunk. This would reduce storage costs and network load by around 60%.

riddim · October 15, 2024, 4:06pm

One good reason I can think of is ‘no exceptions’ and all files are treated the same

neo · October 16, 2024, 12:06am

From my understanding from discussions around that time the change was done was

small files at the time where upload unencrypted
solution was to allow self encryption work on as small as possible. Is it 3 bytes or 15/18?
part of the increased “quantum” protection is to split the file up so no one chunk contains the whole file

You could perhaps, for websites, have an object that reads a file and splits it up into separate file. Like if your site has many images then some images are packed into one file (zip?) and the object unpacks the file when retrieved.

You could have a custom client that creates one chunk only for files less than max chunk size and just upload that.

joshuef · October 16, 2024, 1:03am

This was as data was being uploaded in the clear. Now it’s all encrypted essentially (unless < 3 bytes).

As a default this seems sensible to me.

If you were to manually encrypt the data eg, you could have that happen. Probably that could be another client-side option (this is all client side). To allow for such optimisations down the line

happybeing · October 16, 2024, 3:49pm

Thanks, but I don’t understand!

Are you saying it would be hard for SE to encrypt/decrypt the data automatically when storing a small file in the datamap? I’m not clear how SE works exactly, so not sure if it just isn’t feasible.

If it is feasible, it seems like a very desirable feature to include as a seamless feature of SE. But maybe you’re saying that’s not possible?

riddim · October 16, 2024, 5:55pm

Well - I guess the 3 encrypted parts can always be glued together and put into 1 chunk - so it is possible for sure

I think he is saying it’s not on the roadmap as of now but can easily be done later on because it’s client side and not affecting network operation at all

… But then again he can just speak for himself =D…

neo · October 16, 2024, 10:37pm

You can do the encryption yourself into just one chunk if that is what you want.

SE needs 3 chunks because of it using the previous and next slice to encrypt a chunk. Thus 3 is the minimum.

It was done needing 3 to make it close to impossible to reconstruct a file with just one chunk and quantum resistant encryption.

rreive · October 17, 2024, 1:25am

Do we have any data on uploading large zip files that then get broken up into chunks to get some real test results to ensure downloading that chunked zip file actually stitches together and works? Double encryption gives me the willies.

Same Question to validate large unencrypted compression files?

Any takers?

neo · October 17, 2024, 1:51am

People have uploaded linux iso files.

The client downloading validates each chunk. The hash of the chunk validates the chunk. Better checking than disk drive sector checks

joshuef · October 17, 2024, 3:44am

Hmm, it’s where/what you decrypt with that’s the thing.

I think essentially it’s out of scope for self encryption (it’s no longer self encrypting here).

So you want some other means of encrypting and managing keys there. There vault / store of your own data could well do this, and this is what’s imagined to happen down the line.

And just to underline, there’s nothing inherent in the network blocking us here, it’s client side data organisation alone that needs to be standardised really.

yup!

happybeing · October 17, 2024, 11:46am

Even though he’s said that’s true I’m not sure.

My point is that it would be good to have this in the first API to keep the file/datamap format standard. So the client API just generateds the datamap as needed according to file size.

Apps then don’t need to handle special cases, do their own encryption, worry about keys, know about datamap format etc.

@joshuef suggests it’s about security, so there could be an issue there idk.

Security could though be handled by a flag: always self-encrypt or for small files store inside the data map.

That’s cleaner, gives massive early benefits and also simplified more efficient apps.

Doing this in the API later is tricky because you end up with two file formats and no clean way to know which you have.

What will happen in practice will then depend on app devs, but most likely they will use SE as is so we lose the benefits for some time, and make it hard to change later. And any apps built up to that point will probably continue to do things the old way because reasons.

My take here is:

the API could be implemented in a way that dramatically reduces storage costs and network load
there may be use cases that require the option (default?) to always self encrypt
this won’t be implemented now because it is not seen as high priority
implementing it later may fall by the wayside because it won’t be able to be done cleanly

I always knew that a client could do this itself, but that wasn’t my question because ad-hoc doesn’t deliver the benefits - to do that it needs to be in the API, and probably from the start.

happybeing · October 17, 2024, 12:03pm

The API would be much improved by:

providing vastly cheaper and more efficient storage options in the API (as previous reply) using a common datamap format. (I agree this is out of scope for SE, but that’s not the issue. The idea is to have this in the storage API.)
~~versioning in the datamap would allow this to be added later. @joshuef does the datamap format cater for this? If not I think that at least should be a priority.~~
support for storing data to datamap style API from memory. Currently the API for this requires a file, which is inefficient and will lead app devs into bad practice (writing to and leaving data on the local device for example). There’s an old issue open for this but no replies yet AFAIK.

My second point was wrong as this would break the xor address for the datamap which is a proxy for the file address.

This also means that if this doesn’t go into the API from the start it can’t added later or the same for will have different addresses between one version of the API and a later one.

That pushes this enhancement into the app layer which would be very bad: killing universality which means to read some files you will need a particular app and not just any app using the API. It also means the same file will have different addresses according the the app which created it, and whether it used the API as-is, or a more efficient version built on top.

@joshuef For these reasons I don’t see that this can be done later as you suggest. It either needs to be in the API from the start, or it creates a host of problems that put the very large benefits out of reach forever or makes them app specific which reduces to marginal and only specific use cases.

Anselme · October 18, 2024, 3:01pm

Agreed. Was implemented recently with the new autonomi api!

Anselme · October 18, 2024, 3:10pm

This IS the reason.

riddim · October 18, 2024, 3:11pm

but the 3 chunks could just be glued together and put into the data map instead of writing 3 addresses there - right? (for chunks < max_chunk_size)

or what’s the advantage of writing 3 additional chunks + executing 3 additional fetches …?

Anselme · October 18, 2024, 3:43pm

Since the encryption/decryption key is derived from the hashes of the 2 other neighboring chunks putting all the chunks together results in giving away the key.
Separating those 3 chunks and sending them to each edge of the network with no apparent link between them is what makes the key secret.

riddim · October 18, 2024, 3:45pm

~~hmmm - the data map is encrypted? then why not encrypt the 3 chunks as the data map would be?~~

I guess there’s more pressing issues atm - forget me and my annoying questions

Anselme · October 18, 2024, 3:54pm

No worries! I’m on my free time here, scrolling the forum at 1am for fun and no profit
The datamaps are not encrypted. Which is why we have the vault to keep them safe. Another rabbit hole!!

riddim · October 18, 2024, 4:03pm

And with the content of the data map people can query the chunks and it’s basically the same as putting the chunks in it in the first place isn’t it? (or am I missing something?)

Anselme · October 18, 2024, 4:21pm

Yes exactly. Putting the datamap on the network makes the data public.

For small data that doesn’t need to be private, the use of a datamap is not necessary, the data could just be put as is in a chunk.

For bigger data, we have the one size fits all system with datamap and encrypted chunks.

Topic		Replies	Views
API enhancement to deliver 70% cost reduction and 70% network load reduction Features api	8	315	October 24, 2024
Self_encryption and compression, questions and thoughts Features	67	1457	May 3, 2021
Self encryption and de-de-duplication Features	14	1961	December 6, 2015
Deletable Data, and Secure uses for Structured Data Features	106	5280	August 19, 2015
Self Encryption on the SAFE Network Videos	94	5907	May 30, 2015

Self-encryption - design decisions

My take here is:

Related topics