AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.
This paper discusses the applications and importance of content-based information retrieval technology in digital libraries. It generalizes the process and analyzes current examples in four areas of the technology. Content-based information retrieval has been shown to be an effective way to search for the type of multimedia documents that are increasingly stored in digital libraries. As a good complement to traditional text-based information retrieval technology, content-based information retrieval will be a significant trend for the development of digital libraries.
With several decades of their development, digital libraries are no longer a myth. In fact, some general digital libraries such as the National Science Digital Library (NSDL) and the Internet Public Library are widely known and used. The advance of computer technology makes it possible to include a colossal amount of information in various formats in a digital library. In addition to traditional text-based documents such as books and articles, other types of materials--including images, audio, and video--can also be easily digitized and stored. Therefore, how to retrieve and present this multimedia information effectively through the interface of a digital library becomes a significant research topic.
Currently, there are three methods of retrieving information in a digital library. The first and the easiest way is free browsing. By this means, a user browses through a collection and looks for desired information. The second method--the most popular technique used today--is text-based retrieval. Through this method, textual information (full text of text-based documents and/or metadata of multimedia documents) is indexed so that a user can search the digital library by using keywords or controlled terms. The third method is content-based retrieval, which enables a user to search multimedia information in terms of the actual content of image, audio, or video (Marques and Furht 2002). Some content features that have been studied so far include color, texture, size, shape, motion, and pitch.
While some may argue that text-based retrieval techniques are good enough to locate desired multimedia information, as long as it is assigned proper metadata or tags, words are not sufficient to describe what is sometimes in a human's mind. Imagine a few examples: A patron comes to a public library with a picture of a rare insect. Without expertise in entomology, the librarian won't know where to start if only a text-based information retrieval system is available. However, with the help of content-based image retrieval, the librarian can upload the digitized image of the insect to an online digital image library of insects, and the system will retrieve similar images with detailed description of this insect. Similarly, a patron has a segment of music audio, about which he or she knows nothing but wants to find out more. By using the content-based audio retrieval system, the patron can get similar audio clips with detailed information from a digital music library, and then listen to them to find an exact match. This procedure will be much easier than doing a search on a text-based music search system. It is definitely helpful if a user can search this non-textual information by styles and features.
In addition, the advance of the World Wide Web brings some new challenges to traditional text-based information retrieval. While today's Web-based digital libraries can be accessed around the world, users with different language and cultural backgrounds may not be able to do effective keyword searches of these libraries. Content-based information retrieval techniques will increase the accessibility of these digital libraries greatly, and this is probably a major reason it has become a hot research area in the past decade. Ideally, a content-based information retrieval system …