DataStore over AppendableData design

Additional thoughts

Clarification

I have here been investigating how the data storage implementation I currently work on, would need to change as to work with AppendableData instead of MutableData, given a specific solution for append only data storage (LSM trees).

I.e.:

  • How could an LSM tree backed storage be designed
  • How would it work if we had AD instead of MD
    and
  • What would I assume of AD functionality for it to hold

Mostly I thought about this because I wanted to know how I should continue the development of it even after such a change, and naturally because it is just as fun to solve this problem as it has been to try solve the problem with MD the first time.

What this is NOT (if anyone thought so), is a proposal for AD or really an opinion at all about AD. It is NOT a proposal for MaidSafe on something they (or anyone else) should implement. Its not a proposal at all. This is just me sharing my work on the data store, and possible solutions I see for it and ways to adapt my code and design to potential changes in SAFENetwork.

Implications of the design

Now of course, in doing that work, it also leads my thoughts further on what the change means in a larger scope than my own implementation.

The data store I design, is not assuming limitations in what SAFENetwork is suitable for. That means: it assumes that it can be used for anything that people use for example MongoDb or Cassandra, etc. etc. for. (LSM trees are used in data stores such as Bigtable, HBase, LevelDB, MongoDB, SQLite4[5], Tarantool [6], RocksDB, WiredTiger[7], Apache Cassandra, InfluxDB[8] and VictoriaMetrics[9].)

One thing that came up when I was thinking about problems with this data store design, in the larger scope, is exactly this fundamental difference in the existence of garbage collection or not.
LSM trees are implemented as write only, but they are not implemented (as far as I know) on media that cannot be garbage collected.
That is quite a fundamental difference.

I am assuming here that data storage capacity growth, and need for garbage collection, operates at different levels. What makes individual hard drives working with log structured filesystems, not to fill up on day-to-day basis, is not solved by data storage capacity growth (which obviously also happens in parallel out in the world, but is not what in practice avoids the fill up). On a macro level, would the constant upgrades (replacing) of storage media by vaults in the network have this effect? My initial feeling - without having checked the numbers - is that the orders of magnitude are different, so that it wouldn’t work. (Anyone up for some back of the envelope estimations are welcome to come forth!)

The data storage with LSM trees over appendable data design, that I have described above, would give a very large production of waste data.
Now, it is of course one technical approach to emulating mutability in a 100% immutable setting.
It assumes that we for our applications do need to emulate mutability, and that it is a fundamental necessity as to produce software that we can use for everything we need.
It seems to me that it touches on the fundamental issue, that it is not a property of this specific implementation, but indeed a property of any system trying to emulate mutability in 100% immutable setting.
I would be interested in hearing other approaches, ideas and suggestions, as to either show that my suspicions are wrong, or maybe to confirm them, whatever might bring us any step closer to knowing how to best deal with it.

Dynamic, static or something else?

There’s also the possibility to view the SAFENetwork as not supposed to deal with that kind of data handling, i.e. large amounts of dynamic data, and instead more of a static data store.
Technical boundaries would lead to some limitations in use cases, and there’s nothing wrong with that.
A space rocket is very good for taking things out to space, but not very good for plowing or harvesting lands in agriculture. (Well, maybe you could do some controlled shock-wave harvesting with a SpaceX rocket and a joy-stick, piling up crops in nice heaps at the edges of the fields. But I guess there’s a big risk of just a very expensive sea of fire and chaos :smile: )
Point being, maybe this advanced rocket shouldn’t be used for everything, as a versatile multi tool, but to solve specific problems. And so, such insights would guide us better in our efforts, with what we try to do. Now, I still hope and wish that the full spectrum of possibilities will still be there, but I don’t demand of the flowers to be trees, or the sun to be the sea. Everything is what it is :slight_smile: and the best way to use it is to know things for what they really are. I guess that way of seeing things is a big part of what makes me like exploration and problem solving.

7 Likes