To use SAFE as a database, you need a way to query the data. A way this can be done today is to set the XOR name of a mutable data to something that can be used for querying and storing an index as the value of that MD.
As an example, you may have a list of persons that you want to lookup, so you make an app that indexes your person collection an add some namespacing the keys. The keys of two of the indexes might be hash(“/mydb/persons/firstname/john”) and hash(“/mydb/persons/lastname/doe”). The values of these two indexes would be a list of the keys of all persons in your database with the first name John and all persons with the last name Doe. Maybe you have the persons “John Doe”, “John Gregor” and “Carl Doe” in your database. To run the query SELECT * FROM persons WHERE firstname = 'john' AND lastname = 'doe'
the app would download two lists of persons, the list of persons with the first name John and the list of persons with the last name Doe, then compare these two lists and see which person(s) can be found in both lists.
This basic pattern can be used to make indexes for pretty much any query. Spatial queries, range queries, etc can all be made by making indexes using various data structures. In many cases this is likely to be a very good solution, but sometimes the lists or other data structure your app needs for the query could get huge and the data you would suddenly need to download and process lots of data and maybe partition the indexes so you’d have to download many MDs, that might not give a good experience, especially on mobiles.
Range queries, e.g. find all persons between 25 and 50 years old and spatial queries, e.g. find all restaurants in this area are typical examples of queries that might be run more efficiently by utilizing the SAFE network itself.
The way to do this would be to implement two new functions. There should probably be a range of type tags reserved for these two functions and they should only return type tags from within that range. The type tag should refer to one specific way or algorithm used for creating the XorName and also what data to expect from the value.
The two functions would be something like this:
GetClosestByName(XorName, n, typeTag)
Get the n closest MDs to an XOR name
GetByRange(fromXorName, toXorName, typeTag)
Get all MDs within a range of keys
Nodes should know the XorName of what they store already and they should know other nodes with ids close to themselves, so it would seem to me that doing these queries shouldn’t be all that demanding on the network.
These two functions are useful because there are many algorithms and ways of transforming some value from a document, that could be useful for querying, into the key (XorName). An album with a release date could for example have the date stored in the key and a query could then be used to find all albums within a certain date range. Semantic hashing as discussed in the threads semantic search and indexed structured data is another example.
The indexing, calculating hashes etc would be done on the client side by the apps. The are several reasons not to run it on the vaults. One is that there are many different algorithms that could be useful, some of which would be application specific, another is that the computations might be a bit heavy and it’s better not to strain the vaults with this, last is that it’s not really needed as the one running the app would generally be the one who wants the data to be queryable anyway.
I found some papers on related topics that might be useful.
http://www.cslab.ece.ntua.gr/~dtsouma/index_files/comcom_chaz.pdf