AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.

Prototype preservation environments.

Library Trends

| June 22, 2005 | Moore, Reagan W.; Marciano, Richard | COPYRIGHT 2008 Johns Hopkins University Press. This material is published under license from the publisher through the Gale Group, Farmington Hills, Michigan.  All inquiries regarding rights should be directed to the Gale Group. (Hide copyright information)Copyright

ABSTRACT

The Persistent Archive Testbed and National Archives and Records Administration (NARA) research prototype persistent archive are examples of preservation environments. Both projects are using data grids to implement data management infrastructure that can manage technology evolution. Data grids are software systems that provide persistent names to digital entities, manage data that are distributed across multiple types of storage systems, and provide support for preservation metadata. A persistent archive federates multiple data grids to provide the fault tolerance and disaster recovery mechanisms essential for long-term preservation. The capabilities of the prototype persistent archives will be presented, along with examples of how the capabilities are used to support the preservation of email, Web crawls, office products, image collections, and electronic records.

PROTOTYPE PRESERVATION ENVIRONMENTS

The San Diego Supercomputer Center (SDSC) collaborates with the National Archives and Records Administration (NARA) on research on the development of a prototype persistent archive. The collaboration examines how advanced data management systems can be used to support the long-term preservation of data. The original goal included an assessment of mechanisms for management of technology obsolescence. The ability to migrate electronic records to new storage systems was called "infrastructure independence." The preservation system should be extensible and be able to use more cost-effective storage technologies as they become available. A second goal was the assessment of scalability mechanisms that would enable support for archives holding hundreds of millions of files and hundreds of terabytes of data. The data management technology that meets these goals is called a "data grid." This article examines how data grids support preservation requirements.

Preservation is the process of migrating a digital entity forward in time while preserving its authenticity and integrity. (1) Authenticity is an assertion that a specific digital entity can be identified relative to the context in which it was created. The context includes provenance information such as the creator of the digital entity, procedural information such as the processes that were used to create the digital entity, and administrative information such as the institution that authorized the digital entity creation. The integrity of a digital entity is an assertion that the information content of it has not been modified, that the chain of custody can be verified, and that transformations on its encoding format were performed by identified archival procedures.

A digital entity can be an electronic record, a data file created by a scientific application, a text file created by a word processing system, an image taken by a remote sensor, or any string of bits that can be named. The preservation process requires the extraction of the digital entity from the environment in which it was created and the import of it into the preservation environment. Once the digital entity is under the control of the archivist, then the authenticity and integrity properties can be implemented with assurance that continued access is sustainable. This article looks at the challenges that must be overcome when extracting a digital entity from its creation environment, the technologies that can be used to manage authenticity and integrity, and some examples of preservation environments.

PRESERVATION CHALLENGES

The idea that a digital entity can be extracted from its creation environment is called infrastructure independence (Moore et al., 2000). A digital entity depends upon both software and hardware infrastructure to ensure its support and management. Thus, a file resides in a file system that provides a storage location, a name for the file, management of file properties, names for the persons who are allowed to manipulate the file, and controls on the type of permitted operations. The file properties typically include the size of the file, the owner of the file, the date the file was created, and the date the file was last modified. The extraction of the digital entity from this supporting environment requires the ability to impose

* storage of the digital entity at a location specified by the archivist

* a persistent naming convention for the digital entity that remains invariant as the digital entity is moved between storage systems

* management of file properties that are needed to assert authenticity and integrity

* persistent identifiers for the archivists who are managing the preservation environment

* persistent management of the access controls for allowed operations.

Infrastructure independence means that no matter where the digital entity is stored, the archivist retains the ability to control each of the support properties, independently of the mechanisms provided by a particular choice of storage system. Ideally, an archivist would be able to import a digital entity into a preservation environment that guarantees that the naming conventions will persist through all future choices of technology. One way to implement infrastructure independence is to insert a data management layer between a digital entity and the underlying storage environment. The archivist controls the persistent naming conventions through the data management layer. This approach is illustrated in figure 1.

[FIGURE 1 OMITTED]

In the original creation environment, the application that created the digital entity interacted directly with the storage system (shown by the dashed arrow). In the preservation environment, the applications that are used for display and manipulation now interact with a storage system through a data grid, in which the digital entities have been organized as a collection (Rajasekar, Marciano, & Moore, 1999). The data collection is used to assign metadata attributes to each digital entity to manage the authenticity and integrity properties.

The data grid provides its own naming conventions to describe the logical storage location, the logical file name, the metadata attributes, the distinguished names for the archivists, and the control and consistency mechanisms. Each logical name space that is managed by the data grid is essential for implementing infrastructure independence. The logical name spaces can be used to manage digital entities that are distributed across multiple storage systems and located at multiple sites around the country. The logical name spaces make it possible to use global identifiers that do not change when a digital entity is moved to another storage system. We can illustrate this by considering examples of how each logical name space would be used by a preservation environment.

DATA GRIDS

The software infrastructure that implements a collection-based data management infrastructure for distributed data is called a data grid (Foster & Kesselman, 1999). The software infrastructure runs as an application (or server) on each computer platform that manages a storage system. The data grid servers talk to each other in a federated environment. Messages can be sent between servers to move files, replicate files, and access files. The digital entity properties managed by the data grid are stored in a database as…

Related articles from newspapers, magazines, journals, and more
Creo Bids for Scitex Units To Create a Digital Entity.
Magazine article from: Graphic Arts Monthly February 1, 2000 700+ words
The half-billion-dollar deal would reduce the number of global systems suppliers. On January 18, Creo Products Inc., Vancouver, B.C., and Scitex Corporation Ltd., Herzlia, Israel, announced an agreement to combine their prepress businesses. In the deal, Creo would acquire Scitex's digital preprint
Starcom merges its divisions to create a single digital entity.(Starcom UK...
Magazine article from: Campaign February 27, 2004 700+ words
The Starcom UK Group has amalgamated its two digital divisions - Starcom IP and MediaVest IP - into one entity, Starcom Digital. The division will be led by Jeremy Hill, who will report to the chief executive of Starcom MediaVest, Iain Jacob. It will have specialist teams at both the Starcom
MediaCom Announces Strategic Realignment of Digital Capabilities.
Press release article from: Business Wire February 11, 2008 700+ words
...Is Now Called Beyond Interaction, As Part of New Global Digital Entity New Structure Deepens Commitment to Digital and Furthers...restructuring include: * Creation of new globally-united digital entity - Beyond Interaction - unifying all MediaCom global digital...
Digital duo delivers: surging times Discovery adds original progs. (Inside...
Magazine article from: Daily Variety LaPorte, Nicole July 10, 2003 700+ words
...year from eight to 24 original shows, solidifying its position as Discovery Networks' most prized, and best endowed, digital entity. That honor is due to the backing of the Gray Lady, which last year paid Discovery Communications $100 million for a...
Internet librarian show focuses on portals, digital reference. (conference...
Magazine article from: Library Journal Block, Marylaine January 1, 2002 700+ words
...Brave New World Without Books," dealt with the question of how well an information center can function as an entirely digital entity. Their experience was based upon the university having closed two libraries in 1998 and the subsequent merging and purging...
Top Computer Companies Hold Their Positions in Technology Business Research,...
Press release article from: Business Wire March 15, 1999 700+ words
...position this quarter as its score continues to improve due to increased efficiency and synergies in the combined Compaq-Digital entity. Gateway's (NYSE: GTW) focus on direct sales through its website, 1-800 number, and Country Stores, coupled with...
Internet Librarian show focuses on portals, digital reference. (conference...
Magazine article from: School Library Journal Block, Marylaine February 1, 2002 700+ words
...Brave New World Without Books," dealt with the question of how well an information center can function as an entirely digital entity. Their experience was based upon the university having closed two libraries in 1998 and the subsequent merging and purging...
Designing online mathematical investigation.(analysis)
Magazine article from: Journal of Computers in Mathematics and Science Teaching Gadanidis, George Sedig, Kamran Liang, Hai-Ning September 22, 2004 700+ words
...of a larger class of Web-based learning tools, currently referred to as learning objects. A learning object is "any digital entity designed to meet a specific learning outcome that can be reused to support learning" (CLOE 2003). Online IVs are unique...
National Archives to Hold Conference March 16: 'Beyond the Numbers:...
News wire article from: The America's Intelligence Wire February 9, 2006 700+ words
...budgets and the ever-increasing price of energy. Conference topics will include: tracing the evolution of preservation environments, specifying the requirements for a preservation quality storage environment, understanding supporting mechanical...
Mass Storage Systems and Technologies: Proceedings. (CD-ROM...
Magazine article from: SciTech Book News June 1, 2005 700+ words
...included. Some paper topics are a QoS provisioning framework for an OSD-based storage system, mitigating risk of data loss in preservation environments, and predictive reduction of power and latency. There is no subject index.
For more facts and information, see all results
©2010 Gale, a part of Cengage Learning. All rights reserved. About us | FAQs | Contact us | Privacy policy | Terms and conditions
Other Gale sites: Encyclopedia.com | HighBeam Research | Acquire Content | Books & Authors | Goliath | MovieRetriever | Smart QandA

The AccessMyLibrary advertising network includes: womensforum.com GlamFamily