In search of intelligent seach

Keyword technology

Search on many of the Web sites is based on keyword technology, which parses a word or phrase and quickly scrutinizes an index of text references that match.

The index is constructed from feedback that is supplied by automated crawlers, sometimes called bots or spiders, which comb all the content on a given Web site, a domain of several sites or the Web itself. Most crawlers capture keyword references to content, based on titles of pages and frequently used nouns in the first few paragraphs of content. Those references are stored in the index underlying the search engine.

But the frequency with which keywords occur in the index is where the trouble begins. Many businesses use freeware search engines or off-the-shelf software packages, which have varying levels of indexing and classifying capability.

A search engine powerhouse, such as AltaVista, Excite or Northern Light, covers the entire Web, building a huge index based on its crawlers' survey of millions of Web pages.

AltaVista has indexed 350 million - the number that's left after duplicates have been stripped out, offensive sites have been removed by "family filtering" and spammers, who load up sites with popular keywords in hopes of attracting traffic, have been eliminated. As a result, a simple keyword search on AltaVista for "men's brown belts" will yield 6,953,460 hits, because each word in the phrase is found on many sites.

"You have to control what gets displayed at the top of the results," John Piscitello, senior business manager at AltaVista, said he tells customers such as e-commerce software seller Ariba and book and music seller Amazon.com. "It's a lot like shelf space. The majority of users will only look at what's on the first page," much as shoppers in a store look for favorite brands in the prime shelf space.

But getting the most relevant results onto that first page remains a supreme challenge.

If you are trying to find out who said, "The business of America is business," and you search on the words without surrounding them with quote marks, the results vary widely from search engine to search engine, with most of the results referring to business topics. If you search with the sentence in quotes, Alta Vista, Google, Lycos, Netscape Search, Northern Light and other major search engines return Calvin Coolidge as the source of the comment on the first page. On the other hand, if you ask for an "I Feel Lucky" single answer from Google's natural language-capable site, you get back the day's headlines from Business News America, a Latin American news service, the Interactive Week test showed.

One of the best search engines at deciding relevance is Ask Jeeves. Its replies to natural language queries get away from keyword limitations and are frequently appropriate. But they are based on human editors who observe frequently asked questions and make sure Jeeves' results include the sites most likely to have the desired answers.

Ask Jeeves a question that his background editors haven't anticipated - for example, The City of New Orleans ran over how many miles of track? - and Jeeves' reply is as nonsensical as those of other search engines. The City of New Orleans is a former passenger train. But Jeeves' first response to the question is to offer to tell you how far it is from New Orleans to New Orleans.

On the other hand, Ford is implementing Ask Jeeves to help it supply answers to the most commonly asked questions concerning Ford Explorer tires, said Sean Murphy, vice president of product management at Ask.com, supplier of Ask Jeeves.

Most businesses, however, can't justify the expense of staffing their search function to the Ask Jeeves level, Hagen said.

Northern Light indexes 310 million pages, but includes all the words in a document, not just titles or the first few paragraphs. So when it hunts for where the words, "men's brown belts" occur, it gets 72,622 results. Put the phrase in quotes and the search engine is restricted to those documents where the three words occur in order together. Northern Light then comes back with 39 results, primarily focused on karate and secondarily on shopping, one or the other of which might reflect the searcher's interest.

Advertisement

Talkback 0 comments

Latest Videos

Sponsored content

Power Centre - Content from our premier sponsors

Blogs

  • David Braue All I want for Xmas is Telstra pricing
    Five consecutive days without broadband has led me to what seemed at the time to be an act of desperation: contemplating signing up for Telstra's 100Mbps cable modem service.
  • Array Sick of broken tender sites
    Some of the state governments desperately need to invest in more user-friendly tender sites so that looking for information on government tenders doesn't have to be a game of blind man's bluff.
  • Array Cyberwar: What is it good for?
    In this week's episode, Cyberwar. What is Australia's place in the world of digital warfare? What are the implications for the NBN?
  • More blogs »

Tags

Back to top

Featured