AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
In the fall of 1993, it had been 18 years since the last full edition of the Encyclopaedia Britannica was typeset. In less than a year, the editorial staff went from asking, in a memo now infamous in the Chicago headquarters, "Have you ever heard of the Internet?" to the final stages of a top-to-bottom redesign of editorial and production facilities that has recast the 228-year-old encyclopedia not as a book, but as a database with print, CD-ROM and online media. Less than one year after the Internet memo, Robert McHenry, the editor in chief, was adding the last of his 398 personally selected
external URLs into the new BOL, first offered to institutional clients in the fall of 1994. Two years later, in the fall of 1996, BOL is available to more than one of every four college students in the U.S., in addition to other institutional and individual subscribers.
his case study captures Encyclopaedia Britannica at a moment of transformation. Shaped by the past, by 228 years as a bound set of volumes, the encyclopedia is being molded anew for a future in which covers will not define the boundaries of its knowledge base. The story demonstrates that a print publication of substance, one where content and high editorial standards have been the hallmark of its reputation, can go online, sell itself, and not sell itself out and not sell itself short. It may be the first encyclopedia on the World Wide Web, as it claims, and it may succeed in redefining the term to fit the new environment, but content still rules at Britannica.
It is the intent of the current stewards of what is called the greatest single reference work in the English language to retain the authority of the original print product and to redefine it as a dynamic, up-to-date front end to the wider world of information. This transformation of System Britannica started in the last decade and is well under way today. "System Britannica" refers to the highly automated production system that extracts and processes the text and graphics files that become Britannica Online (BOL), Britannica on CD-ROM (BCD) and the traditional printed encyclopedia (EB). Future plans indicate the direction the transformation is yet to take.
Getting online
The timing of Britannica's online push was auspicious. When the initial memo came out, Britannica had just sold Compton's, and it was the quick deployment of the Advanced Technology Group, which had been pulled together for the Compton's cd-rom project, that got the BOL program rolling. Harold Kester, vp of research and development for the ATG, had been working with search-and-retrieval technology before it belonged to Encyclopaedia Britannica. In 1989, Britannica had contracted with Del Mar, the company that developed SmarTrieve, to develop a prototype CD-ROM. Britannica proceeded to buy the company, and it was this group, including Kester, that formed the core of the ATG that created the Compton's CD-ROM and later BOL and the Britannica CD. Compton's, however, had presented a problem of smaller dimensions than the Encyclopaedia Britannica. Where Encyclopaedia Britannica was 255 MB of text, Compton's was only 60 MB and required less precision of search and retrieval.
The ATG along with Britannica's editorial and publishing technology staffs in Chicago, had already been working on the problem of converting Encyclopaedia Britannica data, breaking them into logical retrieval units and experimenting with a free-text search engine for several years. In 1991, it created the first electronic product, which was a searchable index on CD-ROM. The index had a natural-language query interface; the hits indicated where the material was to be found in the current print set. Internally, there was already a full-text, electronic version of the encyclopedia, but it was not yet ready for the market. The next year, the ATG was able to put the entire text (no graphics) onto two CD-ROMs, which were used as distribution media to install the work on a lan. In 1993, Britannica published the Britannica Instant Research System (BIRS). This version, which was quite costly, saw limited distribution to a professional research market. It wasn't until 1994, when compression techniques enabled it to squeeze the whole encyclopedia onto a single CD-ROM that Britannica released the first commercial version of Britannica on CD-ROM (BCD).
Obvious choice. Three years ago, few publishers considered Mosaic a candidate for commercial publishing. Kester saw differently. He had been looking around at various client-server solutions to replace the BCD browser, and when Mosaic came along in 1993, the choice seemed obvious to him. The group bought the WAIS search engine and started work on its enhancement.1 By September of that year, it had an internal prototype of the entire encyclopedia in HTML, online and indexed with WAIS, fully searchable and accessible with Mosaic. With the prototype approved, the push to bring Britannica online began in earnest in January 1994. The staff set a goal of completing conversion and putting it up in nine months.
Converting the data. The work here was not in making a one-time conversion of the 44 million words on the mainframe editorial system, but in writing conversion routines reliable enough to do this conversion several times a year, on hundreds of thousands of articles, with full confidence that it would work with an absolute minimum of hand touchup of errors in the resulting pages. The conversion task, though complex, was made much easier by the fact that Britannica already had its text richly marked up with regularized structural tagging. Also, Compton's had been in the same editorial markup language, so the team had some experience in converting this type of data.
A special add-on to the editorial system that Britannica had already developed is an indexing system that inserts and tracks unique bidirectional pointers for each entry in the print index and its source reference. These unique identifiers, used to generate page numbers for the printed index, are easily translated to unique URLs when the database is exported to HTML.
One of the most difficult aspects of the project was the segmentation of larger articles into smaller pieces. …