Google-like searches on Safe Network

VaCrunch · June 15, 2017, 11:49am

Right now, if I want to get some information about a subject I can type search terms into Google and quickly retrieve references. Will it work the same way in the Safe Network? Could a search app be available that would scan the population of vaults and return numerous citations or examples related to my search term?

Zoki · June 15, 2017, 12:00pm

Interestingly… I just decided to look at the search interest for “maidsafe”.
https://trends.google.com/trends/explore?q=maidsafe

VaCrunch · June 15, 2017, 12:04pm

That’s funny, my Google search returned 159,000 results for maidsafe. Again, will the results of a search on the Safe Network be similar once it launches?

About 159,000 results (0.32 seconds)
Search Results
MaidSafe - The New Decentralized Internet

MaidSafe’s distributed platform enables the creation of fast and secure applications that help ensure digital privacy, security and freedom for all.
‎Safecoin · ‎Features · ‎MaidSafe · ‎Alpha Release
MaidSafe (@maidsafe) · Twitter
https://twitter.com/maidsafe
via @ThomasClaburn @WolfieChristl Cracked Labs corporate #surveillance #smartphones = major source #data collection bit.ly/2rdHZ3E
1 day ago · Twitter
via @MIT_CSAIL MIT explained in 2015 why “exceptional access” to #encryption would be a bad idea bit.ly/2tgtk8D #infosec
1 day ago · Twitter
SAFENetwork meetup London 5th July - RSVP at: www.meetup.com/Project-… #SAFENetwork
1 day ago · Twitter
via @TeklaPerry #SiliconValleyHBO is raising the profile of the #decentralized #internet bit.ly/2rR76fy #Safenetwork
3 days ago · Twitter
MaidSafeCoin - Coin Market Cap
MaidSafeCoin price today, MAID to USD live, marketcap and chart | CoinMarketCap
MaidSafeCoin price, charts, market cap, and other metrics.
SAFE Network Forum
https://forum.autonomi.community/
MaidSafe Asia Forums ( 2 ) [Community] (35). How is Farming Centralization Disincentivized? ( 2 ) [Safecoin] (34). SAFE-FS – Decentralizing Online Files ( 3 4 5 …

Traktion · June 15, 2017, 12:49pm

You can crawl safe net links, just as with the clear net. I see no reason why search couldn’t be just as good on safe net as clear net.

anon40790172 · June 15, 2017, 1:37pm

This is not how Google works. Google scans all the words out there on forehand, so when you google something they just provide it from a database. We could have search websites the same way on SAFE. No problem.

Stark · June 16, 2017, 2:16am

Though initial crawling will be longer due to hop latencies no? Meaning that new sites and information discovery will take much longer. Search engines would lag hard in providing fresh data. I think @neo had an idea to speed up the process by integrating a form of link aggregation data type that would feed a native database for engines to query IIRC.

intrz · June 16, 2017, 3:29pm

How fast you can crawl is an interesting question.

If you had a list of a million websites or ten million websites you’d get a bunch of computers to crawl them in parallell. Anyone have any kind of guess for how long that might take?

How to make a very large decentralized search index is another interesting question.

Eventually with compute I guess the whole thing could just run on safenet, an autonomous search engine dApp.

VaCrunch · June 16, 2017, 5:24pm

IMO if a user cannot search on the Safenet as quickly and effectively as on the Clearnet from the getgo, mass adoption will be severely restrained. This speedy search function should be fully deployed before the network is let loose in the wild, if possible. Today using Google is an afterthought for most people. In a way, it IS the internet. Anything short of “fully-comparable” will be a turn off for many.

capivarao · June 16, 2017, 7:36pm

Google was not made by internet creators.
Same thing here. If there is a market to be explored, developers should solve the problem to profit.

VaCrunch · June 16, 2017, 9:24pm

Like it or not, Safe will be competing against Clearnet and Google. Better to be prepared. Proper incentives and assistance should be given to make sure a good search app is ready to go. If not, it’s non-existence could turn out to be the “killer” app for Safe Network, or at least the ball and chain that it drags along. Ideally, you would want to anticipate and eliminate major reasons for rejecting Safe very early in the game if possible. There will be little comfort in rationalizing that the original creators of the internet did not invent Google so why should we if it sounds the death knell. Instead of Safe really taking off and flying out of the gate it could mean that, like the original Internet, Safe might take years to catch on universally.

This one, I believe, is more than a “market to be explored”; it is a fundamental requirement for a replacement of the internet.

davidpbrown · June 16, 2017, 9:50pm

It is obvious yet what option there would be for hosting capability outside SAFE?..

I wonder hosting inside SAFE would require payment on each potential result set, which might inhibit the frequency of updates??.. not sure I’ve seen a route to prompt a reply from a resource outside of SAFE for this or any other application… I suppose in theory there might become a route to database like huge.file reading and then a payment that is significantly less for that.

I’ve yet to spend time understanding the cost related to mutable data; so, unsure how best to expect OP problem might be solved.

intrz · June 16, 2017, 10:16pm

Could something like this work with acceptable cost, performance etc?

A crawler would crawl all sites and make indexes as mutable data with the key being a single term and when looking up the key you would get a list of all sites containing the term. If the list of sites got very large it could be partitioned alphabetically.

If you wanted to look up “maidsafe is awesome” the query would be split into the terms “maidsafe”, “is” and “awesome”, maybe “is” would be removed as a stop word and then there would be a get request for, say, “maidsafe” to get the document frequency, i.e. total number of times that term appears at all sites and a list of site containing the term. If there was lots of sites containing the term “maidsafe” you would get get a list of the partitions, for example maidsafe_aa, maidsafe_ab, then do the same for the other terms, then take the overlapping keys, say maidsafe_aa and awesome_aa. Then for each of these you’d get a list of sites together with the number of times the term appeared on each site.

So the process would be something like first running some get requests

GET "maidsafe" ->
{
    "document_frequency" : 5000
    "sub_indexes" : ["maidsafe_aa", "maidsafe_ab"]
}

GET "awesome" ->
{
   "document_frequency" : 230
   "sub_indexes" : ["awesome_aa", "awesome_c"]
}

GET "maidsafe_aa" ->
{
    sites : [{"url" : "aahaha.safe", "term_frequency":2},
             {"url" : "aawh.safe", "term_frequency":1}]
}

GET "awesome_aa" ->
{
   sites : [{"url" : "aahaha.safe", "term_frequency" : 13},
            {"url" : "aawesoome.safe", "term_frequency": 2}]
}

Then on the client it would parse the json and check which sites are in common for all terms, in this case “aahaha.safe” and if there was several hits the term and document frequency and other data could be used to order the hits.

davidpbrown · June 16, 2017, 10:39pm

Interesting idea.

Perhaps a site owner could submit via some tool that see them pay for addition to those?.. I won’t suggest the option they limit to just keywords but perhaps if the cost is high that could be considered. The benefit would be the site owner would update relative to their awareness of the sites refresh timing.

The downside would be the index would be only as fresh as owners actioning that; so, perhaps would not work as well as a search engine that did all the work and considered the difference since last pass. Still, I’m suggesting that with a thought that owner push to index is more like decentralized responsibility than relying on a centralized indexing potential point of failure.

Then again, if decentralized the choice perhaps should be there for site owners to submit keywords rather than all words, empowering them to choose.

/brain-over… I’m tired :yawn:

Stark · June 17, 2017, 12:15am

Data chains appends new information to blocks free of charge. The same could be done for an integrated index. Let SAFE crawl itself and update accordingly. The overhead seems a small price to pay for greater user adoption. Deduplication could keep the index as lean as possible. Any thoughts @maidsafe ?

andreruigrok · June 21, 2017, 11:16am

Hello VaCrunch,

What is new on the fact that Safe Network would have to do to catch up
with the internet?

Back in 2014, the CEO of Cisco said the internet’s worth was $19 trillion.

Don’t you think it already is very clear to investors, that we are going to have to catch up, just by looking at Maidsafe’s Marketcap of $200+ million?

Anders · June 24, 2017, 8:10pm

One advantage of having a SAFE search index early is that it can grow with the growth of the network. So instead of having to crawl billions of pages, in the beginning there will only be hundreds of pages, then thousands and then millions and so on.

The problem is how to make such search index general enough. Google Search has by now an enormously complicated page ranking algorithm that probably includes massive machine learning networks and things like that. Absolutely mindblowingly complicated stuff with perhaps millions of CPUs running millions of lines of source code.

I have an idea of building a SAFE search index by simply hashing the queries and mapping the hashes to pages. A brute force approach that is easy to implement. Unfortunately the tricky part is how to achieve efficient page ranking (plus date range search etc).

Stark · June 24, 2017, 9:27pm

Page ranking could happen client side i suppose. Page relevance would be handled by low capacity high cpu vaults or transient low age nodes. So the network basically checks the index for matches, batches them, sends them to the client that then runs the algo on the local machine. Being open source would mean that eventually millions of eyes could result in an efficient system. Especially if SAFE grows as expected.

neo · June 24, 2017, 11:53pm

I think a lot of the search engine will have to be from the users themselves.

Remember that until network computing occurs an application has to run to populate the search engine and this means either someone runs the App themselves or is smart and writes it as a browser addon for those people who wish to help populate the search engine. Maybe reward those people with a portion of the PTD (pay the app developer) rewards. The PTD rewards are likely to be very high for the search APP since a lot of people will be using them.

ejinte · June 25, 2017, 12:01am

Can’t google be used on safenet? I use google daily and I still like them even though I know they don’t respect my privacy.

neo · June 25, 2017, 12:04am

Only if they migrate across. The two networks use different protocols and if Google don’t implement them they they can’t provide services.

The issue google will have with the SAFE network is how to make loads of money from SAFE. PTD rewards is not enough for them. And when they fear SAFE taking over the internet then they are behind the eight ball even further.

Topic		Replies	Views
Safe-Search, bringing content discovery to the SAFE network Apps	41	4941	February 16, 2018
Brainstorming decentralized search on Safe Features	28	1127	September 21, 2020
Safe Web Crawler Apps	23	4046	September 21, 2017
SAFESearch - Search Engine Apps	55	6956	February 16, 2018
Simple Human-centric Search on SAFE Apps	13	1702	July 3, 2016

Google-like searches on Safe Network

Related topics