AccessMyLibrary : Search Information that Libraries Trust AccessMyLibrary | News, Research, and Information that Libraries Trust

AccessMyLibrary    Browse    O    Online    MAR-01    The WEB as a Database New Extraction Technologies & Content Management.

The WEB as a Database New Extraction Technologies & Content Management.

Publication: Online

Publication Date: 01-MAR-01

Author: Adams, Katherine C.
How to access the full article: Free access to all articles is available courtesy of your local library. To access the full article click the "See the full article" button below. You will need your US library barcode or password.

Bookmark this article

Print this article

Link to this article

Email this article

Digg It!

Add to del.icio.us

RSS

COPYRIGHT 2001 Information Today, Inc.

Information extraction (IE) software is an important part of any knowledge management system. Working in conjunction with information retrieval and organization tools, machine-driven extraction is a powerful means of finding content on the Web. Information extraction software pulls information from texts in heterogeneous formats--such as PDF files, emails, and Web pages--and converts it to a single homogeneous form. In functional terms, this converts the Web into a database that end-users can search or organize into taxonomies. The precision and efficiency of information access improves when digital content is organized into tables within a relational database. The two main methods of information extraction technology--natural language processing and wrapper induction--offer a number of important benefits:

* They help end-users wade through and cope with the overwhelming amount of digital information

* They access the "hidden Web," pages that are generated "on-the-fly" from relational databases as a result of a user query

* They are part of a larger trend toward breaking up the Web into small, more manageable pieces

DEFINING INFORMATION EXTRACTION

Information extraction (IE) software identifies and removes relevant information from texts, pulling information from a variety of sources, and aggregates it to create a single view. IE translates content into a homogeneous form through technologies like XML (eXtensible Mark-up Language). The goal of IE software is to transform texts composed of everyday language into a structured, database format [1]. In this way, heterogeneous documents are summarized and presented in a uniform manner.

To improve accuracy and ease development, IE software is usually domain or topic specific. An IE system designed to monitor technical articles about Information Science, for example, could pull out the names of professors, research studies, topics of interest, conferences, forthcoming publications from press releases, news stories, or emails and encode this information in a database. End-users can then search across this database by textual attribute or feature. A typical search could be for all forthcoming publications about information retrieval or to locate all conference presentations on a specific information science topic. In addition, the structured information contained within a database could be ordered into a taxonomy.

DIFFERENCES BETWEEN IE AND IR

Information retrieval (IR) recovers a subset of documents that match an end-user's query, while IF recovers individual facts from documents. The difference between IR and IE is one of granularity regarding information access. IR is document retrieval and IE is fact retrieval [2].

Information extraction software requires that end-users specify in advance the categories of information they want to capture from a text. For instance, a system devoted to scanning financial news stories could extract all company names, interest rate changes, SEC announcements, or stock market quotes from texts. Because the parameters that define a particular topic are determined a priori, IE systems are fully customizable. IR and IE are different, but complementary. Together they create powerful new tools for accessing and organizing information stored on Web servers.

DIFFICULTIES OF INFORMATION RETRIEVAL AND EXTRACTION...

Read the full article for free courtesy of your local library.


More Articles from Online
InfoTrac OneFile.(Brief Article)
March 01, 2001
Recommended Reading on Pleasing Clients and on Super Searchers.(Brief ...
March 01, 2001
peter's picks & pans.(Brief Article)
March 01, 2001
Testing the Web Site Usability Waters.(Industry Overview)(Statistical ...
March 01, 2001
Ebook Information Web Sites.
March 01, 2001
Find companies classified under Prepackaged software

What's on AccessMyLibrary?

32,122,733 articles
in the following categories:

Arts, Business, Consumer News, Culture & Society, Education, Government, Personal Interest, Health, News, Science & Technology


© 2008 Gale, a part of Cengage Learning  | All Rights Reserved | About this Service | About The Gale Group, a part of Cengage Learning
                                            Privacy Policy | Site Map | Content Licensing | Contact Us | Link to us
      Other Gale sites: Books & Authors | Goliath | MovieRetriever.com | WiseTo Social Issues