Next-generation search tools to refine results

ID: Kanellos (2004) PDF: (afstuderen:Kanellos (2004) - Next-generation search tools to refine results.pdf|PDF)

===== Summary ===== *Calif.–The vast corpus of human knowledge could soon be published on the Internet. The problem now is how to wade through it. *Although search engines have greatly enhanced access to information, and storage technology has made it cheap to digitize nearly everything, search tools need to be refined to make it easier to digest information or conduct queries. That was the word from researchers and speakers at the New Paradigms for Using Computers Conference, held at IBM’s Almaden research lab here last week. *On the desktop, companies such as Ingenuity Software, founded by former Apple Computer developer Bruce Horn, are creating tools designed to make it easier for people to index their photos and documents for subsequent Google-like searches on their hard drive. *new operating systems under development that will include better search tools. ([[Vista]], [[Google Desktop]], [[Spotlight]]) *About 100 million different books have been published in history, Kahle said, citing estimates from professor Raj Reddy at Carnegie Mellon University. About 28 million sit in the Library of Congress. On average, a book can be condensed to a megabyte in Microsoft Word. Thus, the books in the Library of Congress could fit into a 28-terabyte storage system. *!!! “For the cost of a house, you could have the Library of Congress,” Reddy said, adding that mass book-scanning projects are currently under way in India and China. *!!! Only about 2 million to 3 million audio recordings–mostly music–have ever been published for public consumption. The Internet Archive has begun to store digitized recordings of concerts as well and has about 15,000 shows in its database to date. There are between 100,000 to 200,000 theatrical movies–half of them from India–in existence and about 20 terabytes of TV broadcasts a month. The Web grows by about 20 terabytes of compressed data a month as well. (One terabyte equals 1 trillion bytes.) Since 1984, about 50,000 software titles, including CD-ROMs, have emerged. *Though the legal issues around storing and viewing all this information remain thorny, storing it is doable. *!!! “Universal access to all human knowledge is within our grasp,” Kahle said. “It could be one of the greatest achievements of all time.” *individuals will experience an explosion in their personal catalogs of data. *doctors in Cambridge, … equipped patients suffering from severe memory loss with a [[Microsoft SenseCam]] *Microsoft has also entered a three-year alliance with the Edinburgh International Festival in Scotland. In a likely experiment, attendees will wander about the arts fest with SenseCams around their necks, snapping shots. *search engines specialized for certain topics and data sets. That’s the tack taken by [[Berkeley’s Flamenco project]]. *In Flamenco, a Yahoo-like interface categorizes artworks drawn from museum collections *The search engine does not search on the visual information contained in the picture, … Instead … descriptive text submitted by the museums that digitize their artwork for such databases. *[[Inxight]] and [[GeoFusion]], produce graphical representations of data obtained through searches. GeoFusion, which makes software that can extrapolate from geographic data, was able to render a map of the movements of a tagged tuna. *[[WebFountain]] - project is used to test how cohesive certain blogging communities are by how quickly and in unison they react to news events. *File systems will likely begin to disappear as search gains popularity. One of the phenomena that Microsoft researchers are finding in MyLifeBits is that files are largely ad hoc categories that become outdated, said Jim Gemmell at Microsoft Research. *Instead, data should be tagged so that if people remember a name or part of a name, they can find their way back to documents or pictures involving that person, or they can find documents created on the same day that they had a phone conversation with the person, *The Internet Archive has created mobile bookmobiles…

*FileSystem verdwijnt op den duur. Te eenzijdig geindext.