AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
ABSTRACT
A primary role of national libraries is to document the published output of their respective countries. Traditionally, this has meant collecting, describing, and preserving for future generations at least one copy of every item published in print, including books, serials, newspapers, maps, music, posters, and pamphlets. In the last decade, online publishing has had a revolutionary impact on the creation, publication (dissemination), and use of information. This has presented libraries, particularly national (deposit) libraries and other cultural collecting institutions, with the daunting task of collecting, storing, describing, managing, and preserving the vast quantities of information that are being produced online.
A key question to be asked when embarking on this task is, "What should be collected and preserved?" National libraries have responded to this question in different ways. Some, including the National Library of Australia, have taken a selective approach, while others have engaged in whole domain harvesting, or a "comprehensive" approach. This article discusses the advantages and disadvantages of each of these approaches and looks in some detail at the selective approach as exemplified by PANDORA, Australia's Web Archive.
INTRODUCTION
A primary role of national libraries and other deposit libraries is to document the published output of their jurisdictions. Traditionally this meant collecting, describing, preserving, and providing access to library materials for current and future generations. Library materials have included printed books, serials, newspapers, maps, posters, music, and pamphlets. Subsequently the definition of "library materials" was extended to include information stored on other physical carriers such as microfilm, film of various types, audio cassette tapes, video tapes, computer disks, CD-ROMS, and DVDs. These have all presented challenges to libraries because of the need for special equipment to display items in these formats, obsolescence of this equipment and/or the formats themselves, and the need to preserve the information contained on sometimes fragile storage media.
With the development of the World Wide Web in 1993, which opened up online publishing as an easily available, ubiquitous, and relatively inexpensive means of creating and distributing information, national and other deposit libraries accepted that, once again, they must expand their roles to encompass this new form of publishing and all that its collection, description, storage, management, preservation, and provision of access entailed. There are additional challenges to face over and above those inherent in the formats that they already collected. The volume of online publishing is huge. Almost anyone can set themselves up as a publisher, meaning that issues of quality and authority of information need to be addressed, as well as a wide range of competence (or otherwise) in using publishing software and compliance in applying standards. In addition, many of these items are complex Web objects--for instance, Web sites that contain a number of different file formats--and this makes strategies for preservation particularly difficult to formulate and undertake.
WHAT SHOULD BE COLLECTED AND PRESERVED?
While national and other deposit libraries have largely accepted responsibility for collecting and preserving online publications, at least in principle, those that have embarked on the task have responded to it in different ways. They have assessed the task before them in relation to the resources available and have made different decisions about what "finding the balance" is in their particular situation.
Some have argued that, because national and other deposit libraries are typically comprehensive in collecting the published output of their jurisdiction, this same approach should prevail with online publishing. As far as humanly possible, all online publishing must be collected and preserved. Others have argued that, because online publishing is a completely different paradigm from print and other physical format publishing and a different order of magnitude, then a different, selective, approach is necessary and acceptable, and perhaps even desirable. This has led to two broad national approaches to collecting and preserving online publications--the whole domain or comprehensive approach, and the selective approach.
In the mid- to late 1990s a small number of national libraries began archiving programs and exploring different approaches to archiving national documentary heritage online. It is interesting to note that, within five or six years of embarking on a chosen course, most of those libraries seemed to be at a crossroads with regard to planning their future directions for digital archiving (Gatenby, 2002). Whether they were engaged in whole domain (comprehensive) harvesting or selective archiving, each was recognizing the limitations of their chosen approach. There are a number of approaches that national libraries are currently employing to build archives of their countries' publications, which are discussed below.
Selective Archiving of Static Web Resources
The National Libraries of Denmark and Canada have been the principal exponents of this approach. Resources that are like print publications and that do not change or contain interactive or dynamic elements are archived on a selective basis, with library staff making the selection decisions.
Selective Archiving of Static and Dynamic Web Resources
Australia is the only known country with an established program for archiving dynamic as well as static publications and Web sites on a selective basis, once again with a high degree of intellectual input from library staff.
Whole Domain Harvesting
Libraries attempt to harvest automatically the entire Web domain of their respective countries using harvesting robots and a minimum of human intervention for identifying resources. This involves harvesting not only all the resources in the specific country domain but also identifying those of country origin or subject matter in .com and other generic domains. The National Libraries of Sweden, Finland, Iceland, Norway, and…