Google trains indexing bots to fill HTML forms

Google's ever active search bots, which scour the Web constantly for new pages, have begun a new, more active phase of their indexing jobs.

In a blog post Friday, Jayant Madhavan and Alon Halevy of Google's crawling and indexing team said the company has begun an experiment in which its indexing software experimentally enters text in Web site forms to see what previously undiscovered pages may appear.

"In the past few months, we have been exploring some HTML forms to try to discover new Web pages and URLs that we otherwise couldn't find and index for users who search on Google," they wrote. "This experiment is part of Google's broader effort to increase its coverage of the Web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines."

The new Google indexing practice involves only "high quality" Web sites and doesn't run on sites with "robots.txt" files or other standard mechanisms of warding off indexing software.

To decide what words to "type" into the forms, the indexing software samples from among words on the Web page with the form, Google said.

The technology looks related to a company called Transformic that Google acquired, according to a blog post by Anand Rajaraman, who was involved with the technology earlier in his career, while working for Halevy.

Yahoo has also begun indexing the Web with its third-generation software, Slurp 3.0.

"With everything now in place, the rollout has officially begun," Sharad Verma and Yoram Arnon said in a blog post this week.

Unlike Google, Yahoo didn't detail what's new with its indexing software.

Advertisement

Talkback 0 comments

Latest Videos

Sponsored content

Power Centre - Content from our premier sponsors

Blogs

  • Phil Dobbie Is wholesale-only backhaul just a pipedream?
    The potential acquisition of Pipe Networks by SP Telemedia has raised the question about whether vertically integrated backhaul providers will mean higher wholesale prices for ISP customers.
  • Array Get extensions going in Firefox, redux
    Previously on Null Pointer we looked at getting extensions working in Firefox betas, and that was great until the fine folks at Firefox changed their minds.
  • Array How reliable is IP telephony?
    Have you ever heard a weird kind of hissing, crackling or popping noise when calling someone on an IP telephony line? How rare is the phenomenon these days?
  • More blogs »

Tags

Back to top

Featured