Hey all. Lately, I've been thinking a lot about searching autocompletion for my website. The functionality is very similar to say, something like Youtube (start typing in a song name or artist name). I've been milling over the idea of using a Trie data structure for the search. On the web, this can be problematic as there's no pointers and such that make tree-like structures easy. I would have to store a list of words somehow (in a Trie structure) and would have to fetch and parse this string every time I went to do a search (not to mention inserting and deleting from the structure wouldn't be trivial).
Currently, I just use jQuery's autocomplete that reads from an XML file (which I'm guessing just does a linear time search that has to parse the whole file anyway). It works fine I suppose, but it actually looks for matches in the entire string (not just matching prefixes) and returns them in the order they appear in the XML file (I would prefer if they ranked based on number of queries which is also stored in the XML file).
At one point I was just considering an AJAX request to a page that would just query the database for matches. I'm guessing this would be quite a hit to the server (you'd have to do the request on literally every letter typed) but it may be preferential to having users do this processing themselves (especially if they're on a mobile device).
Do any of you have some success stories on implementing your own autocomplete or can point me in the right direction?
performance-wise, you should be able to afford to run a full-table scan for every few letters typed on either the client or server.
if you have a persistent backend like node.js, it's almost surely going to be less than a 50ms turn-around, since node can search objects in RAM.
if you have to query a db or load a flat file in asp or php, your're probably looking at 50-500ms per request.
you can use PHP's APC to ram-cache your data structure, drastically lowering the cost of each request on that server.
if you wait, say half a second from the time typing stops, resetting the timer with each keypress, a 500ms turnaround is usable. not zippy, but usable.
i'm not sure how detailed your search needs to be or how much data your searching, but i would throw out a soft-limit of 2mb of JSON to be processed by a JS client to cover phones and other weak devices.
don't use XML, it requires FAR more CPU power to build it's DOM, and a bit more network bandwidth to double-send each key name to utilize the same data with XML as with JSON.
2mb is a lot of text, the bible is only ~1.2mb, so if all you want to search are <title>s and <meta description>s, there's no reason a simple index cannot be served to the client.
if the index is under 250kb, and you're doing it client-side, you can go ahead and suggest each keypress, without a pause; the search will be instant.
you can even make an excel spreadsheet of your fields, save it as CSV, and use ajax to grab to CSV, which is a lot more efficient of a format than JSON, since key names are not repeated each row, and strings need no quotes...
this would likely be the same type of file you would load in PHP or ASP, and the over-all routine is related, it's just a matter of where and when it's performed.
if you expand upon your data collection, # of files, #of fields, #of indexed fields, etc, i can probably help you find a good technological match for your needs, but it's hard to say universally.