An emerging Web search project is out to keep Google, Yahoo and MSN honest -- and improve the process of finding useful, non-commercial information on the Net.
Called Nutch, the project is developing open-source software for locating documents online. But unlike major search providers, it won't cloak its formulas for matching relevant results to visitors' queries. Rather, it will provide an open window into its calculations, with links to explanations on how it determined each result, according to lead architect Doug Cutting.
"All of the existing search engines have secret methods for deciding which documents are the best documents," said Cutting, whose CV includes research and development stints at Excite, Grand Central and the Palo Alto Research Center. "Search is something that's a basic need for users of the Internet -- it's a valuable tool and yet it's controlled secretly, and that seems like a bad setup. People have the right to know how their search engine works so they can trust it."
Nutch itself has been operating secretly for roughly the last year, as it gathered support from developers and funding from one of the biggest commercial players in search: Overture Services.
Two researchers from Overture -- an advertising-supported search service in the process of being acquired by Yahoo -- approached Cutting last year with interest in providing funding for an open-source search system for academic research. Already itching to work on another search engine, Cutting spearheaded the effort from there, bringing on three founding developers, and forming a board of directors that includes Mitch Kapor, founder of Lotus and co-founder of the Electronic Freedom Foundation; and Tim O'Reilly, founder and president of tech book publisher O'Reilly & Associates.
Despite its connection to Overture, the project is not-for-profit and aims to advance search by supplying a technology for experimentation. Academic researchers or developers will be able to download the software and adapt it without having to reinvent the wheel, Cutting said. Foreign governments could use Nutch to develop a non-commercial search site for citizens, rather than licensing a proprietary, ad-supported technology, he said. Or corporate entities could build a for-profit business around the technology.
"If this is Linux, we're hoping there would be Red Hat," Cutting said, drawing a comparison with the open-source operating system and one of the leading companies offering it.
Searching for the next big thing
Search has become a hotbed for innovation in the last year as marketers have poured money into ad campaigns that tie their products to specific search terms. Overture and Google have built billion-dollar businesses around ad-supported search, and all the major portals have recommitted themselves to Web navigation as a result. Top computer scientists at the major portals and some academic researchers are devising ways to improve on search for the Internet and a host of applications.
The industry has also undergone much consolidation in the last year, and only a few companies -- Google, Yahoo and MSN -- are fielding the majority of search traffic worldwide. (Yahoo, for example, last month agreed to spend nearly $1.7bn to buy Overture.) With fewer and fewer players, the industry has little room for checks and balances, industry watchers say. Sites such as Google-watch.org have emerged to try to lend transparency to or raise questions about the company's growing importance in Web search.
Nutch has already taken the wraps off its downloadable software for research, which is suitable for testing by other developers but probably too arcane for the average Web surfer. It is aiming to have a public site by October that will allow people to search 100 million documents to be used as a measure against indexes such as Google.
For example, a Web surfer could pull up search results from Nutch, with transparency to its mathematical calculations, and compare them with those from Google, which does not publicise its formula for calculating search results. Nutch is actively seeking funding for hardware that would support traffic from Web surfers, but for now its systems do not have the capacity to handle an influx of visitors.
Overture would not detail the amount of money it has donated to Nutch. But it said that the effort was part of a desire to better "understand the current issues surrounding search and innovative solutions in that area," said Overture spokeswoman Jennifer Stephens.
Shortly after Overture last year founded its own research group, run by Gary Flake, it invested in the open-source search engine for academic research and to further its own learning, Stephens said. But since Overture acquired AltaVista and Web search technology from Norway-based Fast Search & Transfer, those technologies have come to be the core of its Web search technology and testing. Nutch is an alternative test bed for the company's use, she said.
The engine is written in Java and is based on Lucene, a software library that developers can use to add search to technologies such as email. Nutch builds upon Lucene, also developed in part by Cutting, and uses the technology as its intersearch library and indexing tool. But Nutch is designed to index and crawl the entire Web.
Cutting is particularly concerned about the effects of advertising-heavy search providers. As the engines become laden with links to products and services, that cargo could sway a search for non-commercial data. He's also concerned about US search companies becoming dominant overseas.
"It would be nice if there were an open-source search engine owned by the world."











