Google Suggest Roundup

Update: Chris Justus has stepped through the Google Suggest script, documenting it.

Google Labs’ Suggest Service isn’t the first use of XMLHTTPRequest to supply autocompletion in search, Christian Stocker built it into the Bitflux Blog back in April of this year.

But it’s the first internet scale version of the idea.

When you search Bitflux’s blog, each keypress is captured and triggers an XMLHTTPRequest that runs a full search of the blog. So each search will spawn at least two requests to the server. That’s fine if you’re a weblog or corporate site.

For Google, you’d at least double millions of requests a day. Going from 1,000 to 2,000 or even 10,000 is one thing. Going from hundreds of millions to billions is a leap.

The clever thing Google appears to be doing to avoid running an actual search until the user hits submit.

Several other blogs, such as Simon Willison’s, observe that with each keypress, the XMLHTTPRequest returns a snippet of JavaScript

sendRPCDone(frameElement, "humph", new Array("humphrey bogart", "humphrey", "humphreys corner", "humphreys", "humphry davy", "humphrey davy", "humphries", "humphry bogart", "humphreys by the bay", "humphrey institute"), new Array("476,000 results", "3,530,000 results", "44,400 results", "672,000 results", "43,200 results", "48,800 results", "1,030,000 results", "3,550 results", "84,600 results", "383,000 results"), new Array(""));

That snippet’s eval()ed, calling a preloaded JavaScript function to create the DIV with the autocompletion suggestions.

Google doesn’t need to create those suggestions sui generis with each request. They can create a lookup table and cache it in memory. a and zylo have the same cost if you’re looking it up in a hash.

It’s entertaining to walk through the alphabet:

http://www.google.com/complete/search?hl=en&js=true&qu=[a-z]

and see what Google’s guesses you’re looking for.

The extreme cleverness on Google’s side comes from creating, managing, and updating that table.

The lookup table approach makes sense for sites that aren’t internet scale. If you connect to a Google Search Appliance on your intranet, you need to process the XML it returns, so if you built an autocomplete for your web-front-end, it could easily get bog slow as your server side code dispatched the request to the Google Box and digested the results to create the list of suggestions.

A handmade lookup of popular searches, or one created as a batch job based on your search request logs, would speed up the autocomplete for your site.

Meanwhile, people are already hacking on the Google interface. Remember, default JavaScript security won’t let XMLHTTPRequest talk to a host other than the current page.

More like this: , .