AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.

A Guided Tour to Approximate String Matching.(text processing)

ACM Computing Surveys

| March 01, 2001 | NAVARRO, GONZALO | COPYRIGHT 2001 Association for Computing Machinery, Inc. This material is published under license from the publisher through the Gale Group, Farmington Hills, Michigan.  All inquiries regarding rights should be directed to the Gale Group. (Hide copyright information)Copyright

1. INTRODUCTION

This work focuses on the problem of string matching that allows errors, also called approximate string matching. The general goal is to perform string matching of a pattern in a text where one or both of them have suffered some kind of (undesirable) corruption. Some examples are recovering the original signals after their transmission over noisy channels, finding DNA subsequences after possible mutations, and text searching where there are typing or spelling errors.

The problem, in its most general form, is to find a text where a text given pattern occurs, allowing a limited number of "errors" in the matches. Each application uses a different error model, which defines how different two strings are. The idea for this "distance" between strings is to make it small when one of the strings is likely to be an erroneous variant of the other under the error model in use.

The goal of this survey is to present an overview of the state of the art in approximate string matching. We focus on online searching (that is, when the text cannot be preprocessed to build an index on it), explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We also consider some variants of the problem. We present a number of experiments to compare the performance of the different algorithms and show the best choices. We conclude with some directions for future work and open problems.

Unfortunately, the algorithmic nature of the problem strongly depends on the type of "errors" considered, and the solutions range from linear time to NP-complete. The scope of our subject is so broad that we are forced to narrow our focus on a subset of the possible error models. We consider only those defined in terms of replacing some substrings by others at varying costs. In this light, the problem becomes minimizing the total cost to transform the pattern and its occurrence in text to make them equal, and reporting the text positions where this cost is low enough.

Related articles from newspapers, magazines, journals, and more
Publication No. WO/2009/094649 Published on July 30, Assigned to SRA...
News wire article from: US Fed News Service, Including US State News August 4, 2009 700+ words
...developed a system and method for variant string matching. The patent has been assigned to...product, and system for variant string matching. A computer implemented method for variant string matching may comprise comparing with a computing...
Publication No. WO/2009/087757 Published on July 16, Assigned to Mitsubishi...
News wire article from: US Fed News Service, Including US State News July 20, 2009 700+ words
...of Japan, have developed a string matching section that identifies categories...present invention explains that "string matching section identifies categories...input documents by means of the string matching of the input documents and...
Intrusion.com adds new open source network IDS signature download centre on web...
Press release article from: M2 Presswire October 24, 2001 700+ words
...today with more than 900 open source string matching signatures (C)1994-2001 M2 COMMUNICATIONS...total number context analysis and string matching available for SecureNet Pro to more...the broad coverage of open source string matching signatures. With the daily onslaught...
Patent No. 7,574,742 Issued on Aug. 11, Assigned to Industrial Technology...
News wire article from: US Fed News Service, Including US State News August 12, 2009 700+ words
...Felix Wu of Davis, Calif.,have developed a method of string matching for uniform data classification. The inventors were issued...groups match those of the input strings. In one aspect, the string matching further comprises matching the input strings with the signature...
Gene-IT Announces Gene Sequence Patent Strength Service Featuring GenePAST...
Press release article from: Business Wire December 2, 2002 700+ words
...Unlike traditional techniques, the GenePAST algorithm is based on an approximate string-matching algorithm that uses an objective measure - the string "edit distance" - to indicate the difference between two sequences. Thus, it produces exactly...
Patent No. 7,610,281 Issued on Oct. 27, Assigned to Oracle International for...
News wire article from: US Fed News Service, Including US State News October 29, 2009 700+ words
...from entries in the inverted index and logic to compute an edit distance between a string associated with a query and a string associated...logic configured to provide a signal corresponding to the edit distance." The original application was filed on Nov. 29, 2006...
Evaluating automatic detection of misspellings in German.(Report)
Magazine article from: Language, Learning & Technology Rimrott, Anne Heift, Trude October 1, 2008 700+ words
...checkers are largely based on the empirical finding that the vast majority of L1 misspellings involves an edit distance of one. Edit distance is defined as the number of additions, omissions, substitutions or transpositions needed to convert...
DWJ Television introduces first-ever virtual edit session on the Net.
Press release article from: Business Wire October 25, 1996 700+ words
...Internet is changing the world as much as the printing press did, and the Internet virtual edit session is changing the way we edit. Distance is no longer important. This is just one small step as video makes its way to the web," says Johnson. Professionals in...
For more facts and information, see all results
©2009 Gale, a part of Cengage Learning. All rights reserved.
About us | FAQs | Contact us | Privacy policy | Terms and conditions
Other Gale sites: Encyclopedia.com | HighBeam Research | Acquire Content | Books & Authors | Goliath | MovieRetriever | Smart QandA