WebApp Databases

It seems to me that an immutable RDF database like Datomic would work fine on SAFE… All data is append only with a entity, attribute, value, transaction, and truth field… Rather than replacing a value, you can retract it’s old value with the truth flag, and create a new entry for the new value. This allows time travel, caching, what-if-scenarios and all kinds of very powerful features… With massive Read scalability traded off by rather limited write scalability…

With Datomic, the writes are separated from the reads, and all the querying can be delegated to the client. It seems to me that the transactor would have to be centralized in some fashion though. Although perhaps you could invent a blockchain based transactor or similar contraption.

1 Like

That’ CQRS.
I’ve always considered maidsafe the worlds best largest event store.
I would implement an event store right on top of network. Every stream has it’s file with version and also a map of version nr and event address.

Could give the added benefit of deduplication if timestamp was not stored within the event.

The read model can (just as write model) be generated client side.
Would however not be very performant for complex data/large streams and constant leaving / returning.

4 Likes

I love this thread.

I’ve been wondering for a while what it would look like to manage a complex data model on the network.

My first real client project as an intern was to build a health care management application for a youth counseling services provider. At the time I was instructed by my manager to build a database with MongoDB, which quickly became challenging when I needed to update documents for complex relationships, especially needing to ensure accuracy in medical records and for State billing. At the time I didn’t know any better but I quickly became familiarized with modeling complex relationships for key-value documents.

I’ve been thinking about what it would look like to migrate that MongoDB database to the SAFE network.
I’ve also wondered what it would be like to migrate a MySQL database to the SAFE network.

What I’m currently imagining is that everything is simply composed of MutableData structures and MD name references.

In the healthcare application, for example, I have the following models:

Client, Assessment, BillingInvoice, Casenote, Group, Site, Counselor, Treatment, Session

Taking Client as an example, entries could be like so:

    created_on: {type: Date, default: Date.now},
	active: {type: Boolean, default: true},
	intake_date: {type: Date},
	asi_date: String,
	continue_treatment_date: String,
	discharge_date: String,
	deadline: {
		date: Date,
		form: String
	},
	first_name: String,
	last_name: String,
	age: Number,
	ssn: Number,
	birth_date: {type: Date},
	address: {
		street: String,
		city: String,
		zip: Number
	},
	phone: Number,
	gender: String,
	ethnicity: String,
	marital_status: String,
	disability: String,
	primary_drug: String,
	enrolled_school: Boolean,
	highest_grade: Number,
	secondary_drug: String,
	treatment_service: String,
	modality: String,
	employment_status: String,
	site_groups: String,
	site_location: MD name of Site instance},
	cin: String,
	aid: String,
	aid_update: {type: Date},
	dsm: {type: Schema.Types.Mixed},
	icd10: {type: Schema.Types.Mixed},
	assessments: [<array of MutableData names that represent Assessment instances>],
	case_notes: [<array of MutableData names that represent Casenote instances>],
	intake: MD name of Intake,
	treatments: [<array of MutableData names that represent Treatment instances>],
	counselor: MD name of Counselor instance,
	billable: {type: Boolean}

For retrieving an instance of Client, my first approach would be to fetch the MD, search for entries that are references to other MD’s, fetch them and search again, recursively…

I’ve been playing around with managing deeply nested data on the client, cleanest - npm.

I might use a similar recursive solution to migrate the current MongoDB over to the SAFE network. It may not be bad for migrating a SQL dbms if we could get it in JSON format.

Probably much better ideas and implementations out there but this is where I’ll start experimenting.

11 Likes

I think an interesting further step could be to add schemas. Schemas could be useful for apps to validate the data and to prevent apps from adding MDs with invalid data for the chosen schema. The network itself doesn’t know about these schemas though, so it would be up to the apps to follow them.

One way to make that easy would be to create a library for persisting data entities and have this library support schemas, then your app could use this library to ensure valid data.

To validate the data you could make a validator, basically a function, for each data type. Validators could be regular expression or more complex functions that do things like check if an entity that one entity refers to exists and stuff like that. In you Client example you for example have ssn as a property with Number as the data type. A social security number has a specific length and probably some way to calculate a checksum. If you wanted to ensure that the social security number was at least seemingly valid, you could make an ssn type. The ssn type would basically be some JSON with something like { name : ssn, validator : ssnvalidator }. You would then create an ssnvalidator function in your application and register it with the persistence library so that each time some data with the type ssn was added, it would call the ssnvalidator function to check the data before persisting and return an error if it’s not valid.

Another thing you can do with schemas is to specify which properties should be indexed. You’d for example add some metadata saying that the field first_name should have a fulltext search index and an alphabetically ordered index, age should have an indexed order by number and address should have an alphabetically ordered index. Then you would have an indexed app that you would run to create indexed for quering to data. The indexer could be connected to the persistence library so that it would be run each time you insert new valid data. So for example when you insert a new name, you’d fetch the index (an MD) that contains an alphabetical list of names and insert the new name in the right place, the index should also be split into multiple parts so you’d would have for example an index ra-re for everything starting with ra until everything starting with re etc.

The JSON you posted could more or less form the basis for a schema. For attaching a schema to an instance you could either have a property “schema” that contains the id of the schema or you do it in a typeclassish way and create an MD that contains a list of MDs that implement some specific schema i.e. something like { schema : someId, instances [id1,id2,id3…]}, that is useful if you want to fully decouple the schema from an instance.

A way that I actually like better is to have the properties be the data type. In you example you would then have { … ssn : someNumber } and then ssn would refer to the actual data type. In this case it works better to use an id for the property name, but that also adds some extra complexity, since as a user you want to add the property called ssn or view the property called ssn, not the property with the datatype with id b69ba323-f6ce-4dc8-a49f-e9386acad053.

Anyways, if we assume you use names instead of ids to uniquely specify a property you would then get some JSON that looks pretty much like the JSON in your example, but the property names themselves would be the names of the data types and thus the schema would be contained in each object itself.

The problem with using names instead of ids for property names though is that once you create a data type called phone, then every time you have a property named phone that would refer to the specific data type that you first created. If you want to use the property name phone in another entity that has a phone number that isn’t valid for the phone datatype you first created, you have to name that property something else than phone. If you instead use ids, then you could have several data types like { id: b69ba323-f6ce-4dc8-a49f-e9386acad053, name: phone, validator: myphonevalidator}, { id: bda56193-4a26-43af-a440-ad62e81fee3d, name: phone, validator: newphonevalidator}. Then you would make some index of the datatypes and when you want to add a phone property to some entity, you would look up all properties named phone from the index and see which one, if any, that validates data in the way you want, if none of them does, then you’d create a new one.

2 Likes

It will be reinventing distributed database. Which are already in abandon in market. Correct me if I am wrong.

Cassandra, MongoDB, Redis, HBase, CouchDB are all distributed databases designed to work across multiple nodes and are growing in popularity for big data, IoT etc. Or did I misunderstand what you mean?

Yes I know, that was my point, I was trying to reply on creating new one for @intrz. Instead I believe safe would be streaming data (Mutable/ non mutable) among nodes. I always feel Streaming events is what all about safe network. May be I would be wrong, I am new here.

1 Like

The best resemblance is Linkedin use of Apache Kafka, Pipelines, Events and Stream. I would go more into documents to learn more about safe.

What about storing and querying static data then? Linkedin also uses databases, not just event streams in themselves.

What I described is some ideas from a system I’ve worked on, that I think might be useful for storing and querying data on SAFE, graphs actually. Anyways, while my post is quite long, it’s still very short on details, so there’s still a lot missing there and there’s still many things I’m not sure how to do efficiently on SAFE. I’m thinking though, that it could be an interesting exercise to make some open source libraries for this, then it would also be much more clear what works and not.

5 Likes

Yes I concur, non transnational data needs storage but what about from Safe Website…

“The SAFE Network is fully decentralised, with files distributed and stored all over the world, on different devices. This allows the network to be robust to attacks, with no central point of weakness.”

Is it not itself a big data storage. You may tap into it and maintains yours copy if you want it be private. I am not expert though.

Yes, it’s basically a key/value store, so then you need to find what data models you want to use and ways to do more complex queries than just getting a value by its key.

2 Likes

I’ve been thinking a bit about the design for an app, and for using MDs as a database it seems the tricky part would be dealing with the size limits (1MB, 100 entries) of a MD. A library to abstract away the MD so that you could just have a key/value table of arbitrary size seems like it would be useful. It doesn’t yet exist, right?

edit: rereading thread, looks like basically what @hunterlester was working on. looking forward to tracking the progress.

2 Likes