AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.
Public and private organizations have access to a vast amount of internal, deep Web and open Web information. Transforming this heterogeneous and distributed information into actionable and insightful information is the key to the emerging new classes of business intelligence and national security applications. Although the role of semantics in search and integration has been often talked about, in this paper we discuss semantic approaches to support analytics on vast amounts of heterogeneous data. In particular, we bring together novel academic research and commercialized Semantic Web technology. The academic research related to semantic association identification is built upon commercial Semantic Web technology for semantic metadata extraction. A prototypical demonstration of this research and technology is presented in the context of an aviation security application of significance to national security.
Keywords: content analytics; knowledge discovery; ontology; RDF; semantic analytics; semantic applications for homeland security; semantic association; semantic metadata; semantic web technology
Creating applications that allow users to gain insightful and actionable information from vast amounts of heterogeneous information is one of the most exciting new areas of information systems research. This information may come from numerous sources spanning proprietary, trusted and open-source information, including intranets, the deep Web and the open Web. The fast-emerging markets of business intelligence as well as national and homeland security are finding themselves in increasing need of such applications. One of the clear manifestations of such a need occurs in aviation safety, which became a critically important issue for national security after the tragic events of September 11, 2001. While the current efforts for enhanced physical security measures may help reduce the risk of a similar future event, it is generally accepted that the development of new information-based security systems is a necessary additional capability for defense against such attacks.
Research in search techniques was a critical component of the first generation of the Web, and has gone from academia to mainstream. A second generation "Semantic Web" will be built by adding semantic annotations to Web content that software can understand and from which humans can benefit. Large-scale semantic annotation of data (domain-independent and domain-specific) is now possible because of numerous advances in the areas of entity identification, automatic classification, taxonomy and ontology development, and metadata extraction (Dill et al., 2003; Shah, Finin, Joshi, Cost, & Mayfield, 2002; Hammond, Sheth, & Kochut, 2002). Relationships are at the heart of semantics (Woods, 1975; Sheth, Arpinar, & Kashyap, 2003). The next frontier, which fundamentally changes the way we acquire and use knowledge, is to automatically identify complex relationships between entities in this semantically annotated data. Instead of a search engine that merely returns documents containing terms of interest, we propose an approach that supports semantic analytics of heterogeneous content to return actionable information that gives useful insight into the connection between documents and real-world entities, thus providing better-than-ever support for important decisions and actions. This approach is demonstrated using a prototypical aviation security application (1) called "Passenger Identification, Screening, and Threat Analysis application" (PISTA) that involves discovering and preventing threats for aviation safety. This is one of many semantic applications as part of advanced information technology necessary to support homeland security.
From the research perspective, one of the challenges was to devise a framework for the formal definition and representation of meaningful and interesting relationships, which we call "semantic associations." Semantic associations are at the core of our research in content analytics (2) and knowledge discovery using an ontology-driven process. Other challenges arise from the large scale of metadata sets and the need for complex data structures containing entities and relationships that are used to perform query processing against those sets. Lastly, we need to utilize a notion of context to select relevant subsets of metadata to process. These challenges call for a fresh look at indexing, query processing and ranking, as well as tractable and scalable graph algorithms that exploit heuristics. Our work addresses these challenges, building on our previous research in semantic metadata extraction, practical domain-specific ontology creation, semantic association definition and main-memory query processing. We also discuss how a commercial Semantic Web technology product is used for metadata extraction technology in creating a test bed for PISTA. The next two paragraphs explain the two key parts of this chapter.
PISTA extracts relevant metadata from different information resources, including government watch-lists, flight databases and historical passenger data. Using the extracted metadata, PISTA's semantic-based knowledge discovery techniques can identify suspicious patterns and categorize passengers into high-risk groups, low-risk groups, no-risk groups and positive groups (i.e., passengers increasing the safety). The level of physical inspection and optional interrogation of a passenger can be determined at various planned checkpoints accordingly.
PISTA's theoretical fundamentals are semantic associations. A semantic association represents a direct or indirect relationship between two entities. "Semantics" here specifically involves those relations that are meaningful to the application and can be inferred based on the data itself or with the help of additional knowledge. The term "knowledge discovery" is used to refer to the process of identifying what types of semantic associations are meaningful for the application. Of particular interest to an application like PISTA are those semantic associations that identify passengers that pose a security risk, and discovering various types of semantic associations, such as a passenger's direct or indirect relationship to a terrorist organization. With the use of a commercial Semantic Web technology, Semagix Freedom based on SCORE technology (Sheth et al., 2002), we developed a prototype aviation security application. The prototype demonstrates the use of semantic associations in the calculation of possible risk of passengers in a given flight.
This chapter is organized as follows: First we present a formal description of semantic associations of various types. We then proceed to describe the creation of PISTA's ontology and how it was populated with a large number of instances. It also shows the PISTA architecture and implementation with preliminary results. Semagix Freedom and a national security application based on semantic Freedom architecture are then presented. Lastly, we summarize the related work, and conclude.
Semantic associations are meaningful and relevant complex relationships between entities, events and concepts. They lend meaning to information, making it understandable and actionable, and provide new and possibly unexpected insights. When we consider data on the Web, different entities can be related in multiple ways that cannot be pre-defined. For example, a "professor" can be related to a university, students, courses and …