It would be useful to have a Structured Data type that is automatically indexed and searchable.
EDIT: Here’s a GIST with a bunch of questions and ideas about a possible implementation.
It could be used for extremely useful things like semantic search and graph databases. Semantic hashing fits right into the already existing structure of SAFE (it uses XOR metrics for expressing similarity), and much of what we care about can be expressed as a graph database. @dirvine mentioned both deep learning (which is one of the tools to implement semantic tagging) and graphs here when talking about possible ways for search on the SAFE network.
The mandatory encoding for this type could be a binary variety of JSON. Maybe the already existing BSON format could be extended with a specific data type for 256-bit hashes (or whatever the final block address turns out to be), which would then be automatically indexed.
Index entries would be stored/cached by vaults based on their proximity to the key in XOR space (similar to data blocks), but instead of the contents they would only store the address of the block that contained them.
Searches would be of two types:
- exact, e.g. for references,
- inexact, for semantic tags, where all records would be returned within the Hamming ball of the specified radius around the search key.
The potential for abuse (e.g. flooding the network with random tags) would require countermeasures, such as one or more of these:
- indexed blocks could be more expensive depending on the number of indexed hashes they contained,
- only the first few hashes would be indexed from a block, period,
- searches could be restricted for records coming from blocks owned by a given user (e.g. a well-known search provider).
EDIT: for clarity’s sake: “references” would not have to be block addresses (though I suggest the key size based on that, for convenience); the idea is just that their exact value matters, so they could be a hash of anything:
- a URL
- an (attribute, block address) tuple, like (spouse, 8ae8223f9…992a3)
- an (attribute, value) tuple, like (author, Mark Twain)
- an (attribute, language, value) tuple, like (author, Japanese, マーク・トウェイン)
- you name it