Page II: The vast corpus of human knowledge could soon be published on the Internet. The problem now is how to wade through it.
About 100 million different books have been published in history, Kahle said, citing estimates from professor Raj Reddy at Carnegie Mellon University. About 28 million sit in the Library of Congress. On average, a book can be condensed to a megabyte in Microsoft Word. Thus, the books in the Library of Congress could fit into a 28-terabyte storage system.
"For the cost of a house, you could have the Library of Congress," Reddy said, adding that mass book-scanning projects are currently under way in India and China.
Only about 2 million to 3 million audio recordings -- mostly music -- have ever been published for public consumption. The Internet Archive has begun to store digitised recordings of concerts as well and has about 15,000 shows in its database to date. There are between 100,000 to 200,000 theatrical movies -- half of them from India -- in existence and about 20 terabytes of TV broadcasts a month. The Web grows by about 20 terabytes of compressed data a month as well. (One terabyte equals 1 trillion bytes.) Since 1984, about 50,000 software titles, including CD-ROMs, have emerged.
Though the legal issues around storing and viewing all this information remain thorny, storing it is doable.
"Universal access to all human knowledge is within our grasp," Kahle said. "It could be one of the greatest achievements of all time."
Still, that's a lot to grasp. Similarly, individuals will experience an explosion in their personal catalogs of data. In the MyLifeBits project under way at Microsoft Research, noted scientist Gordon Bell is attempting to digitally capture all of the books, movies, TV shows, music and other media he has experienced in his life. He's up to 44GB of data so far.
E-mails, phone messages, photographs and personal video will also add to an individual's data trove. In another experiment, doctors in Cambridge, England, have equipped patients suffering from severe memory loss with a Microsoft SenseCam, a wearable camera that takes pictures when a person moves. One man is currently using it so he can show his wife, who has memory problems, a diary of the day, said Ken Wood, who works on the project.
Microsoft has also entered a three-year alliance with the Edinburgh International Festival in Scotland. In a likely experiment, attendees will wander about the arts fest with SenseCams around their necks, snapping shots.
Hide and seek
One approach to mastering data overload lies in developing search engines specialised for certain topics and data sets. That's the tack taken by Berkeley's Flamenco project.




6%
1%






