AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
Digital images and video are becoming an integral part of human communication. The ease of creating and capturing digital imagery has enabled its proliferation, making our interaction with online information sources largely visual. We increasingly use visual content to express ideas, report news, and educate and entertain each other. But how can we search for visual information?
Can solutions be developed that are as effective as existing text and nonvisual information search engines? With the increasing numbers of distributed repositories and users, how can we design scalable visual information retrieval systems?
Digital imagery is a rich and subjective source of information. For example, different people extract different meanings from the same picture. Their response also varies over time and in different viewing contexts. A picture also has meaning at multiple levels -- description, analysis, and interpretation -- as described in[10]. Visual information is also represented in multiple forms -- still images, video sequences, computer graphics, animations, stereoscopic images -- and expected in such future applications as multiview and 3D video. Furthermore, visual information systems demand large resources for transmission, storage, and processing. These factors make the indexing, retrieval, and management of visual information a great challenge.
Based on our experience developing visual information systems in Web-based environments, we have analyzed the present and future of such systems, focusing on search and retrieval from large, distributed, online visual information repositories. As a case study, we describe a recently developed Internet-based system called WebSEEk. We also describe the prototype of an Internet meta-visual information retrieval system called MetaSEEk, which is analogous to text-based meta-search engines on the Web. And we explore the challenges in developing scalable visual information retrieval systems for future online environments.
Content-Based Visual Query
Recent progress has been made in developing efficient and effective visual information retrieval systems. Some systems, such as Virage, QBIC, VisualSEEk, and VideoQ[1, 3, 6, 11], provide methods for retrieving digital images and videos by using examples and/or visual sketches. To query visual repositories, the visual features of the imagery, such as colors, textures, shapes, motions, and spatiotemporal compositions, are used in combination with text and other related information. Low-level visual features may be extracted with or without human intervention.
One characteristic of these systems is that the search is approximately, requiring a computed assessment of visual similarity. The items returned at the top of the list of query results have the greatest similarity with the query input. But the returned items rarely have an "exact" match to the attributes specified in the query. Figure 1 shows image and video search examples based on visual similarity queries.
[FIGURE 1 GRAPH OMITTED]
Such systems can also use direct input from humans and other supporting data to better index visual information. For example, video icons in[4] are generated by a manual process for annotating objects in videos, like people and boats, and semantic events, like sunsets. Text indexes have also been generated from the captions and transcripts of broadcast video[8] for retrieving news video.
Visual summarization complements visual search. By decomposing the video, through, say, automated scene detection, a more spatially or temporally compact presentation of the video can be generated. For example,[12] has described news video summarization systems, with efficient browsing interfaces using video event detection and clustering. Others have developed techniques for automated video analysis of continuous video sequences to generate mosaic images for improved browsing and indexing.
Other researchers have begun seeking to automate the assignment of semantic labels to visual content. For example, through a process of learning from user interaction, the FourEyes system develops maps from visual features to semantic classes[9]. Furthermore, by manually developing specific models for visual classes using visual features, such as animals and nude people, techniques for automatically detecting these images area also being developed[7].
Classifying Retrieval Systems
Visual information retrieval systems have been used in many application domains, including libraries, museums, scientific data archives, photo stock houses, and Web search engines. We classify these systems using the following criteria:
* Automation. The visual features of the images and videos extracted and indexed by the system are used for interactive content-based visual …