API enhancement to deliver 70% cost reduction and 70% network load reduction

I believe that there is a one-time opportunity for a relatively small API change that will deliver:

  • storage cost reductions of around 70%
  • network load reductions by a similar amount
    [EDIT: thanks to @riddim for pointing out I forgot to count the datamap in my calculation, so savings are about 4x rather than 3x compared to the current design]

I think this should be seriously considered a high priority for the first Autonomi API because:

  • storage is a core feature used by almost all Autonomi apps
  • the benefits would be enormous
  • the implementation relatively simple
  • much/all the benefits may be lost if done later

The benefits depend on very early implementation because it would I think be a breaking change to content addressing and the storage APIs. So that any apps using the earlier API may have to re-upload stored data in order to take advantage of the new API, or maintain a backward compatible ‘back end’ interface to access older data. Newer apps could well also be unable to access data uploaded using apps built with earlier APIs leading to usability and interoperability issues between similar apps along with confusion for developers and app users alike.

There’s a forum topic here, in which I erroneously suggested this could be an enhancement to Self-encryption. In fact it would be part of the higher level file/data API which would in turn either use Self-encryption (for larger files) or store encrypted data within the datamap for small files (< 1 chunk).

Currently the APIs default to Self-encryption which generates a minimum of three chunks plus a datamap per file, regardless of whether it is small enough to fit in the datamap. In earlier implementations small files were in fact stored in the datamap, but this feature was removed at some point. For small files this represents a reduction of 66% in the number of chunks per file and similar reductions in the storage cost of those files. The reduction in the number of chunks needed to be handled by nodes would also be reduced dramatically, making the network considerably more efficient.

I’ve been unable to locate documentation of the reasons for reverting the earlier implementation to use only Self-encryption (though it may well exist), but from the discussion it appears to be feasible and the points above seem to be valid.

I note that applications can choose to just upload a chunk for a file which is small enough to fit in it, but this will not achieve the benefits mentioned because it would be app dependent, not universal and few apps are likely to implement it. Even those who do will do so in different ways (eg using / not using encrypt, encrypting in different ways).

So this really is a now or probably never choice: a simple API change now, with enormous benefits to the network and its cost effectiveness for storing data benefiting almost all users, especially those using it for personal data or regular apps, which is after all the priority required by the fundamentals of the network (Secure Access For Everyone).

The above is a full copy of feature(api): single API to handle small and larger (> 1 chunk) files in a unified manner. · Issue #2271 · maidsafe/safe_network · GitHub

13 Likes

Is there a risk that an app for storing data would be developed that would simply chop up large files into parts that are the max size of the mimimum size to get this cost saving?

But I think this idea needs discussion!

5 Likes

Iirc, it was to allow even small files to be self encrypted, as a security feature.

5 Likes

No, the saving I’m talking about is really removing an additional - unnecessary - cost for small files.

So larger files don’t get cheaper by breaking them up into lots of small files.

You could do that anyway, by ignoring self encryption and storing the chunks yourself but won’t get any saving that way. You just lose compatibility with apps using the API.

This has been suggested in the discussion if you look, but comments also indicate that it’s not necessary, so remains unclear.

I’ve opened an issue and summarised it here so we can establish the facts and if possible, obtain what could be an enormous improvement.

3 Likes

I haven’t even looked for it in documentation. There was a reasonable amount of discussions in the forum and the change to minimum size for self encryption was decided during discussions. One of the major reasons for forcing SE on any size was also to help reduce the opportunity for a malicious user uploading unencrypted bad data. This helped and then at rest encryption was to be introduced (and since removed due to OOM). Even single chunk encryption is not enough since it can have its encryption broken without too much trouble unless hidden keys are somehow implemented. Hidden as in never on the network.

Yes old apps that only have SE as its encryption method could not read the one record files as the code would not know about it. The API should be written as you say from the start and allow reading of single record small files and SE small files.

It would not be good to make all small files into a one chunk record since for general files people will want the more secure SE over single record encryption. Apps for file upload could allow the option which would be good, as long as the user understands the difference. Along these lines maybe public data could default to one record since its readable by anyone anyhow. But the choice needs to be there since some do not want chunks at rest on a disk to be readable.

In general more choice in this area is good and I hope they implement it in the API for storing a file or other lot of data.

I agree with you that it is a worthwhile addition to be made and only requires simple encryption and I’d strongly suggest it is not used for private data unless the user specifically asks for it.

3 Likes

The security is in data at rest sitting in nodes for public data. For private data the security is there for chunks at rest or being read.

SE pretty much prevents any node being capable of knowing the contents of a record on their disk. Singe record files (encrypted or not) are an open book to both node operator and anyone reading the chunk.

One of the concerns was around text documents containing sensitive data being able to be easily read since its one chunk and unencrypted. Reducing the size of file that can be SE allowed these files to be encrypted but also are chopped up meaning that decryption was pretty much impossible,

That was essential if we want people to be able to use Autonomi for all their files and increase adoption

3 Likes

hmhmmm - and if we’re hones with ourselves the issues is not the additional x3 cost for storing my private blog on autonomi … it’s the x1000 of cost that comes with the blockchain fees

…so I don’t think this being done or not changes something for the possible use cases … it’s more an optimization that would be cool indeed …

4 Likes

Sure Rob, I understand there’s an impact on security of encryption but if encrypting a single chunk at 1MB or 4MB size isn’t regarded as secure I think encryption in general is broken and I don’t buy that.

So without a proper justification I don’t buy the idea that we make the network about 3x more expensive and 3x less efficient because encryption of a 1MB piece of data can be broken somehow, one day etc.

That needs justification IMO and that’s why I asked for documentation on the reason for the current design. If that’s not there this remains an opportunity, but one whose window is closing rapidly.

5 Likes

actually I think it’s 4x expensive/less efficient xD … (at least for public data)

  • 1x Datamap + 3x encrypted chunks vs.
  • 1 blob (possibly encrypted)
3 Likes