AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.
Library catalog systems worldwide are based on collections of MARC records. New kinds of Functional Requirements for Bibliographic Records (FRBR)-based catalog retrieval systems, displays, and cataloging rules will build on ever-growing MARC record collections. Characterizing the kinds of information held in MARC records is thus an important step in developing new systems and rules. This study examined the incidence and prevalence rates of MARC fields in two different sets of library catalog records: a random selection of bibliographic records from the Library of Congress online catalog and a selection of records for two specific works, Lord of the Flies and Plato's Republic. Analysis showed that most fields were used in only a small percentage of records, while a small number of fields were used in almost all records. Power law functions proved to be a good model for the observed distribution of MARC fields. The results of this study have implications for the design of new cataloging procedures as well as for the design of catalog interfaces that are based on the FRBR entity-relationship model.
MARC records are at the center of library cataloging processes. The MARC format, developed in the 1960s, is unlikely to be replaced in the foreseeable future, both because of its proven utility and because of the legacy volumes of existing MARC records held by libraries around the world. While the MARC format may not be going anywhere soon, the ways that MARC records are created and used are in a state of transition. Functional Requirements for Bibliographic Records (FRBR) outlined a conceptual model to describe the bibliographic universe. (1) Implementing the FRBR model in MARC-based cataloging practices and information retrieval systems has proven challenging. Numerous technical, structural, and institutional challenges must be overcome for libraries to shift to FRBR-based cataloging schemes and online catalog displays. Any new methods for cataloging and displaying library resources will be built from the existing MARC record databases. Thus understanding the state of the current data stored in MARC records is essential to the process of moving forward with new display systems and cataloging schemes.
This study serves the effort to better understand these challenges by more fully characterizing the kinds of information that can be found in MARC records. Specifically, this study aims to identify and characterize the patterns in the ways catalogers use MARC fields in bibliographic records by quantifying which fields are most commonly present in library catalog records. The author used two different samples of bibliographic records in this study. First, a random selection of bibliographic records from the Library of Congress (LC) online catalog were collected and examined. Second, a case study approach was used to analyze smaller samples of records from two specific works: William Golding's Lord of the Flies and Plato's Republic. This study tests whether a power law approach is useful in characterizing the distributions of MARC field in each sample and, if a power law distribution exists, what the implications are for the design of new FRBR-influenced cataloging schemes and catalog displays.
This section describes the theoretical background for the analysis of MARC field use patterns reported in this study.
First, the motivation and importance of the study is discussed, then power laws are introduced and described with the goal of illustrating how they can be used to model many phenomena both inside and outside of the library and information science domains.
Motivation for This Study
The motivation for a study of MARC fields in bibliographic records stems from the desire to understand the kinds of information that are available to build more informative displays into online library catalog interfaces. The deficiencies of online catalogs have been well documented. In separate studies, Calhoun and Markey pointed out how online catalogs have been slow to implement features that would greatly increase their utility, such as advanced retrieval techniques for subject searching, the inclusion of tables of contents, expanding the use of full-text searching, and leveraging classifications schemes as finding aids. (2) Certainly libraries faced many impediments in producing advanced catalogs, including financial limitations and the reliance on integrated library system vendors who were unable or unwilling to produce these additional functionalities.
Online catalogs have largely lacked the ability to identify and display relationships between different works and between representations of the same work. (3) Many different kinds of bibliographic relationships exist between library resources. Tillett identified seven types of relationships: equivalence, derivative, descriptive, whole-part, accompanying, sequential or chronological, and shared characteristic. (4) Smiraglia created a taxonomy that expanded on Tillett's derivative relationship and included seven categories: simultaneous derivations, successive derivations, translations, amplifications, extractions, adaptations, and performances. (5) Bibliographic relationships between library resources are common. Smiraglia and Leazer found that approximately 30 percent of bibliographic works in the OCLC's WorldCat have associated derivative works. (6) These relationships are manifested in a number of ways in MARC records, including through uniform titles, series statements, and added entries. (7) Despite this, most conventional library catalogs provide little in the way of collocation based on bibliographic relationships. Integrating these relationships into catalog displays would provide users with a significantly more powerful way to navigate through library resources. (8)
FRBR is the most visible effort to give bibliographic relationships a more central role in modeling the bibliographic universe. FRBR describes a conceptual model that identifies four main bibliographic entities: works, expressions, manifestations, and items. The first three entities are abstract concepts while the fourth entity, the item, represents the physical resource that exists on a library shelf. The FRBR model has been criticized for having a lack of conceptual clarity in the distinctions between the abstract work, expression, and manifestation entities, and for glossing over important differences between books and nonbook materials. (9) Despite these criticisms, the next generation cataloging code, Resource Description and Access (RDA), is integrating the FRBR entity-relationship model into the arrangement and implementation of the new cataloging rules. (10)
Power Laws in Library and information Science
This study uses power law functions to characterize the patterns of MARC field use in bibliographic records. A power law function is a mathematical expression that describes an inverse exponential relationship between two phenomena. Power laws are commonly illustrated through the "80/20 rule" of wealth and power, that is, 80 percent of the world's resources are held by 20 percent of the world's countries, or by the "long-tail" phenomena of marketing and consumption, where very few music or book titles sell a large number of copies and a great many titles sell very few copies,n Power law functions have been used extensively in the library and information science literature. A study in 1995 showed that the individuals behind two of the classic bibliometric power laws, George Kingsley Zipf and Alfred J. Lotka, were at that time among the most cited people in the history of the discipline. (12)
Zipf's and Lotka's power laws provide similar formulations for different kinds of bibliometric phenomena. (13) Zipf derived his law from a study of word counts in a selection of English language texts. He showed an inverse relationship between the number of times a word is used and its use rank with the set of all words used. So, if the most frequently used word was used one hundred times, the second most frequently used word was used roughly fifty times (one-half as many), the third most frequently used word was used roughly thirty-three times (one-third as many), and so on down the word list. Lotka, on the other hand, derived his law from a study of the publication productivity of individual authors within a corpus of chemistry and physics journals. Lotka found an inverse-square relationship between the number of publications by each author and the number of authors with a given number of publications. In other words, if one hundred authors produced one published paper, the number of authors that produced two published papers was roughly twenty-five (one-fourth as many), the number of authors who produced three published papers was roughly eleven (one-ninth as many), and so on. The authorship and publication patterns of many disciplines, including the library and information sciences, have been shown to follow power law distributions. (14) Zipfian and Lotkan distributions have been observed in a number of other library and information science settings. Power laws have been used to describe the forms of names on bibliographic records, the frequency of name headings in the library catalog, library resource circulation patterns, and the use of descriptor term co-occurrences in a bibliographic hypertext system. (15) In 1990, Blair proposed that Zipf's distribution of word use might be used as an indicator of indexing effectiveness. (16) He suggested that the distribution of index term use should match distribution of word usage in documents. According to Blair, a match in term usage distributions would indicate that the indexers and users were using language in a similar fashion and thus bring the conceptions of document representation between the two groups closer together.
The scope of power law functions extend beyond the study of word counts and author …