There is a very large amount of data on the web. Most of the data is however is locked in the databases of the websites. On a website you get to see a single view of that website’s data. Some websites, mainly larger ones, allow access to their backend database via APIs, but things work in different ways and querying data from multiple websites is difficult and time consuming, often requiring lots of custom code.
What if the web could be one big database instead, where data could be seamlessly combined and queried everywhere. This is the vision of the semantic web. The current approach to try to get the data more accessible is to add annotations to the html to make the data accessible from the front end. This is done to a limited extent today, for example by adding schema.org annotations that is used by search engines to understand the content better.
What if instead of annotating the html in the frontend, the backend database could be directly accessible instead? That’s what’s possible with SAFE. When making a website on SAFE the data can be made public and can be available for any other website or app to use. The semantic web, the web as one big shared database, with its countless new possibilities, is something SAFE is pretty much made for.
I think it would be good if app creators try to keep this in mind from the beginning, and try to think of how to make the data from their apps as standardized and easily shareable as possible. We can then not just get a new decentralized web, but a semantic one, to start making the sum of human knowledge available for man and machine alike.
Great point. I suppose that is sort of the aim of some of the nosql-type databases. Search engines such as Google use these types of storage mechanisms, e.g. Big Table, to store that vast amount of data with scalability in mind. Ultimately, all databases persist data on some file system somewhere. Since this is the exact purpose that SAFE is being made for, why not decentralize database persistence as you state? Though I’m thinking that this type of abstraction would not necessarily exist in the base protocol, but at the application layer. Creating specific queries from disparate sources might be something huge on the SAFE network.
SAFE is basically a nosql database, but with rather limited built in query capabilities, it’s just a key/value store. That doesn’t mean you can’t do more advanced queries though, you just have to make indexes and this can be done by apps.
For example if you have a database of music albums, you could store each album as json object in a mutable data. Then you could for example make two indexes, one contains the name of the artist and album sorted alphabetically and one that has the release date and album/artist name sorted by release date. These indexes could also be json objects stored as mutable data.
If a database like musicbrainz was used as the source, there would be millions of albums so the index would have to be split up, since it needs to be downloaded and queried on the client side. For the release date index it could be split up indexes containing 1000 or 10 000 albums each and then there would be another index for these indexes where you would see which indexes you would to download to query a specific date range.