As conversion of Wade-Giles entries to pinyin is underway, this paper reconsiders the usefulness of providing romanization-based retrieval in OPACs for Chinese-language resources. An in-house experiment designed to measure retrieval performance in OPAC title searches under different romanization methods revealed that while romanization is an efficient retrieval method that works relatively well for a large number of patrons, it remains problematic for a significant portion of end users who might be better served with character-based retrieval systems.
Two romanization systems for Chinese data are currently in use in most libraries in the Western World: the Wade-Giles (WG) system, mainly used in North American libraries, and pinyin--called Hanyu pinyin but simply referred to as pinyin--mainly used in European and Australian libraries. In 1997 the Library of Congress (LC) announced its plan for converting to pinyin, which was unsurprisingly endorsed by the bibliographic community at large (except on the issue of word division) since "most librarians [...] have come to realize that conversion to pinyin will be necessary if North American libraries are to provide adequate service to their users." (1) The conversion process timeline, started in October 2000, extended over a one-year period, after which it was expected that the conversion of the records in individual libraries' OPACs would have been completed. (2)
On several points, inclusion of pinyin romanization is a remarkable added value to bibliographic records that facilitates retrieval for a great number of end users. However, relying solely on romanized entries for access may not be suitable for a portion of patrons who might be better served with character-based retrieval approaches. This paper intends to show that while romanization-based retrieval works well for most people, it remains problematic for a smaller but significant portion of catalog users.
* Background and Literature Review
Comparing WG and Pinyin
The recent decision by the LC to convert from WG to the pinyin romanization system was long awaited by many library users in North America. Pinyin, promulgated in 1958, is now fully recognized as the official romanization scheme of the People's Republic of China (PRC); it was also recognized in 1977 as a United Nations Standard, and as an ISO standard (ISO 7098) in 1982. (3) Pinyin is now widely accepted in China and is used extensively by most government and press agencies around the world. It is used in the PRC to help first-graders and foreigners learn Chinese characters. (4) Pinyin is also used in publications such as dictionaries and maps, and sometimes for book and periodical titles. It is widely seen in public places such as building names; street, highway, and railway signboards; and on product labels. (5) WG still enjoys some popularity in Taiwan but on July 26, 1999, the Taiwanese government announced the use of pinyin (Hanyu pinyin from the mainland) for the romanization of street names, suggesting that Taiwan might soon officially adopt pinyin as its romanization scheme. (6)
When pinyin was developed, great care was taken to keep the notation as simple and as internationally acceptable as possible. For this reason, it was decided, "not to augment the Latin alphabet by adding new letters, such as [those used in] the International Phonetic Alphabet." (7) As a result, pinyin uses all the consonants of the roman alphabet--except v, which is only used for the transcription of foreign terms--in conjunction with four digraphs, namely ch, sh, zh, and ng. (8) Except for these four (or five) digraphs, all other consonant sounds are represented with a single roman letter. This, in some cases, proves to be bothersome for the native speaker of English. For instance, the phonetic values assigned to the pinyin letters c, q, x, and z do not correspond very closely to the values that are usually attributed to these letters in English. WG uses more digraphs (thirteen in total), producing a less compact notation, but one from which the sounds are usually, for a native English speaker, easier to infer. (9) For instance, the sound …