Next-generation search tools to refine results

Page II: The vast corpus of human knowledge could soon be published on the Internet. The problem now is how to wade through it.

About 100 million different books have been published in history, Kahle said, citing estimates from professor Raj Reddy at Carnegie Mellon University. About 28 million sit in the Library of Congress. On average, a book can be condensed to a megabyte in Microsoft Word. Thus, the books in the Library of Congress could fit into a 28-terabyte storage system.

"For the cost of a house, you could have the Library of Congress," Reddy said, adding that mass book-scanning projects are currently under way in India and China.

Only about 2 million to 3 million audio recordings -- mostly music -- have ever been published for public consumption. The Internet Archive has begun to store digitised recordings of concerts as well and has about 15,000 shows in its database to date. There are between 100,000 to 200,000 theatrical movies -- half of them from India -- in existence and about 20 terabytes of TV broadcasts a month. The Web grows by about 20 terabytes of compressed data a month as well. (One terabyte equals 1 trillion bytes.) Since 1984, about 50,000 software titles, including CD-ROMs, have emerged.

Though the legal issues around storing and viewing all this information remain thorny, storing it is doable.

"Universal access to all human knowledge is within our grasp," Kahle said. "It could be one of the greatest achievements of all time."

Still, that's a lot to grasp. Similarly, individuals will experience an explosion in their personal catalogs of data. In the MyLifeBits project under way at Microsoft Research, noted scientist Gordon Bell is attempting to digitally capture all of the books, movies, TV shows, music and other media he has experienced in his life. He's up to 44GB of data so far.

E-mails, phone messages, photographs and personal video will also add to an individual's data trove. In another experiment, doctors in Cambridge, England, have equipped patients suffering from severe memory loss with a Microsoft SenseCam, a wearable camera that takes pictures when a person moves. One man is currently using it so he can show his wife, who has memory problems, a diary of the day, said Ken Wood, who works on the project.

Microsoft has also entered a three-year alliance with the Edinburgh International Festival in Scotland. In a likely experiment, attendees will wander about the arts fest with SenseCams around their necks, snapping shots.

Hide and seek
One approach to mastering data overload lies in developing search engines specialised for certain topics and data sets. That's the tack taken by Berkeley's Flamenco project.

Advertisement

Talkback 0 comments

Latest Videos

Sponsored content

Power Centre - Content from our premier sponsors

Blogs

  • Chris Duckett Carelessness busts Linux security
    No operating system can ever properly protect a computer from trojans as long as users continue to do silly things. Just because Linux is immune to your standard drive-by viruses it does not mean that it can escape trojan horses.
  • Array Sun shining on Ajnaware
    Graham Dawson talks about the future of iPhone app development and augmented reality.
  • Array Holiday IT to-do lists
    The fast-approaching holiday season is a great time to update your IT systems while everything's quiet.
  • More blogs »

Tags

Back to top

Featured