Brainstorming decentralized search on Safe

happybeing · September 20, 2020, 12:02pm

Nicely laid out and great use case example

It makes me wonder about the new models we’ll see. For example, using your example, maybe search doesn’t need a central index. Maybe search becomes the best engine at finding the best “person who maintains links about X” based on the context held in each user’s storage.

Ruben Verborgh prototyped something along these lines which I posted here some time ago, where the search uses the knowledge gathered about the user and held client side. He’s not the only one interested in this approach, so I think @mav is right that we’ll see innovation in using context provided by the user to a tool, as well as in how this is applied in search.

FWIW, I don’t accept that centralised search indexes are the best technical solution to this problem. They are easier to understand, but getting them to scale was and remains a big problem.

We centralised search because the incentives were there, whereas we are creating new incentives to solve these problems in a decentralised manner. I expect interesting and possibly surprising innovations and discoveries - including many “Doh, why didn’t I think of that” solutions. I always think of WinZip.

There are lots of opportunities here for those willing to think long and hard about these problems. They are the people who will solve them, so it is brilliant to be here at this stage in the process. Later many will kick themselves that they didn’t have a go, because anyone can innovate and succeed if they’re willing to work at it.

davidpbrown · September 20, 2020, 1:23pm

This is important for any solution or service… keen to see an answer to this.

The idea of ask a friend might work after a fashion but needs to address inertia … how does new content become listed… in theory the creator would have lists and reason to list, so perhaps it sucks people in… if done well. Also, volume… the trouble with this again is natural language is diffuse… a lot of lists might not exist.

jlpell · September 20, 2020, 3:42pm

Hash the search terms and then lookup the data object with the indices for those terms at the corresponding xor address.

The ordering aspect within a single search index would seem to be covered by a pagerank sort. Maybe other sort orders could be selected, such as publication date, author, organization, etc.

davidpbrown · September 20, 2020, 3:59pm

I wonder we’re asking for where the search is a service… so, for any centralised off network service, is there or can there be an option to receive a user’s request and reply to it? Before the network can handle certain services, the option to provide a service to Safe Network might be very useful… without that we might lack basic functionality that users will want.

TylerAbeoJordan · September 20, 2020, 5:30pm

I’m certain that there will be people who will provide a centralized service and interface (safe site/app) for search. There could be some decent profit incentive to do so. But that is off-topic.

davidpbrown · September 20, 2020, 5:31pm

Perhaps but might equally be a necessarily step to something that the network can adopt.

mav · September 21, 2020, 12:55am

Do you think this is practical? It would seem like a lot of data to be uploaded to achieve this - who pays for that? Would it be possible to deal with typos in search terms? What about varying ordering of search terms? When there are many results (eg “buy boat”) does the person have to download the entire result set, then apply their personal ordering?

Do you think this is desirable? Should we be permanently storing the search results? Who decides which pages actually belong in the xor address for that search? How often are the search results updated? Can entries into the search results be spammed?

I like the concept but I would love to hear more about the practicality and desirability of this because there seems like some very difficult barriers (as compared to the current ‘ask a friend’ style google option).

If this is a public algorithm then the rank will be gamed. I think this is a big part of why google introduced additional context into their search results, pagerank alone wasn’t giving good enough results. But I like the idea of being able to publicly store and rank pages at a search-term-xor-address.

jlpell · September 21, 2020, 1:15am

I really don’t understand why you think this is necessary
A separate namespace to hold indices where the hash of the search terms is the xor address is a rather elegant solution. Isn’t this what you originally proposed above?

Example:

User types in the following to the safe browser address bar.

search: cat videos

Next, the client browser computes the hash of “cat videos” and treats it as a xorurl. It then navigates to the xor address.

Case 1) The xor address exists. Once there a human readable list is presented for these terms with sorting options (pagerank, date, publisher, subject, alphabetical, etc.)

Case 2) The xor address of the complex phrase does not exist. The user is presented with a banner page to suggest they reduce the number of search terms, or gives them the option to retrieve indices for the individual terms and do the cross reference manually. They could also then be given an option to publish the fruit of their labor as an initial index for the complex phrase.

Other details might include how safesite owners could submit their site descriptions to index xor addresses, or how safe crawlers can update the indices in a secure and truthful manner while being compensated with PtP for every GET request to the index address.

Built-in browser search, no central service provider required other than the Safe Network itself, distributed content crawling and verification, a versioned history of index evolution thanks to permaweb… what’s not to love? Wasn’t this your idea?

P.s. The fun part begins once these indices are built and knowledge graphs can be constructed.

davidpbrown · September 21, 2020, 6:13am

Because without option to receive and reply to users many service to Safe Network are hobbled.

One case would be opentimestamp, which becomes a whole lot more complex otherwise. Services could be marked off network to be clear what is in network, if that’s a concern.

I’m just looking to maximize utility to see as many real world use cases become possible. Sometimes that is at odds with pure use of one true solution but most solutions are a mix until they improve.

So, search as service would be trivial off network until the network can adopt a decentralised equivalent… or better.

Topic		Replies	Views
Safe-Search, bringing content discovery to the SAFE network Apps	41	4971	February 16, 2018
SAFE Search App Apps	190	5275	February 18, 2021
Google-like searches on Safe Network Apps	31	3356	January 23, 2018
Simple Human-centric Search on SAFE Apps	13	1709	July 3, 2016
Safe Search 2023 Apps	12	779	February 11, 2024

Brainstorming decentralized search on Safe

Related topics