AccessMyLibrary provides FREE access to millions of articles from top publications available through your library.
Developing software that can be used across global enterprises is one of the many challenges of today's information technology systems. Taligent's CommonPoint [TM] application system eases this problem by providing a foundation for fully global software, based on object-oriented frameworks and the Unicode [TM] character encoding standard. This paper describes Taligent's Unicode implementation and the CommonPoint text and international frameworks. It discusses how the CommonPoint system can be used to build international software and some of the advantages of object-oriented technology.
Many organizations today are faced with the challenge of implementing software that operates seamlessly across national borders. The goal is to create global applications, that is, applications that have a single binary form that can be used everywhere. These applications are then localized for use in a particular geographic region, usually a country, that shares a language along with other local characteristics such as a time zone, currency units, and common number and date formats. Unfortunately, it is not easy to create global applications, and often a different binary version is needed to support each country or region. In addition to the expense and inconvenience of creating and maintaining multiple versions of an application, documents created using a particular localized version of a program cannot be displayed correctly by other versions.
Taligent's CommonPoint(**) application system facilitates the creation of global software. The CommonPoint system comprises a set of integrated object-oriented frameworks, implemented in C++, that enable the development of modular object-oriented applications and documents. The CommonPoint system runs as a layer on existing operating systems, including the Advanced Interactive Executive(*) (AIX(*)) and Operating System/2(*) (OS/2(*)) environments. The CommonPoint system allows applications to be created with these global qualities:
* Users can enter and manipulate textual and numerical data in their native language, and can create and display multilingual text.
* The application can be completely localized without accessing its source code. The interface can be presented in any user's native language, and the same binary version can have multiple localized presentations.
* There is high potential for customization by both the user and the developer, provided by object-oriented frameworks and modular, data-driven objects. Users have more control over which resources they use, and developers can take advantage of the functionality already provided by the frameworks and focus on adding more specialized features and more localized resources.
This paper describes the support for globally distributable software provided by the CommonPoint application system. The CommonPoint system enables international software development by providing:
* An implementation of the Unicode(**) character encoding standard that provides a common mechanism for storing character data regardless of language. The Unicode character set provides a full set of symbols and other characters, enables the creation of text in multiple languages and scripts, and provides data integrity.
* Text handling mechanisms that facilitate the storage and manipulation of multilingual-styled text
* Character input features that allow users to enter multilingual text using today's standard input devices
* Localization services that allow localizable resources to be easily created, stored, and customized for use in a specific language or geographic region
* Powerful object-oriented frameworks and data-driven localizable objects that enable a high degree of customization and extensibility
The paper describes these mechanisms and discusses the impact of object-oriented technology on the implementation of international software.(1)
Applying the Unicode standard
Use of the Unicode standard as the sole character encoding mechanism is the foundation for the CommonPoint international feature set. Because the Unicode standard is so fundamental to the design of the CommonPoint system's text and international frameworks, it is worth summarizing the standard and some of its features here.
Developed by the Unicode Consortium,(2) the Unicode standard is a fixed-width, 16-bit character encoding system that contains codes for every character needed by the major writing systems currently in use in the modern world, along with codes for a full range of punctuation, symbols, and control characters. The Unicode standard provides, in all, codes for over 34 000 characters from the world's alphabets, ideographs, and symbol sets. The standard incorporates characters from many existing standards--for example, the first 256 characters correspond to the International Organization for Standardization (ISO) Latin-1 character set (which attempts to provide character coverage for the major Western European languages)--and is compatible with the international standard ISO/IEC (International Electrotechnical Commission) 10646.(3,4)
Along with a script or character name, the Unicode standard associates semantic information with each character that can be used to simplify text processing features. Each character can have an associated set of descriptive type properties identifying, for example:
* Punctuation marks (for example, [?] and ['])
* Diacritical marks (for example, ['] and [??])
* Uppercased, lowercased, and uncased letters (for example, [A] and [a], respectively--uncased letters appear in languages such as Hebrew and Arabic that do not distinguish between uppercase and lowercase)
* Characters used to represent digits (for example,  and )
* Control characters (for example, a carriage return or end-of-text character)
Exclusive use of the Unicode standard for all character data in the CommonPoint system automatically eases several of the problems inherent in creating international applications on many of today's current systems by providing a simple and consistent interface, for manipulating character data, that does not vary based on the language being manipulated. Many programs on existing systems are currently based on much more limited character sets--for example, the 7-bit ASCII (American National Standard Code for Information Interchange) character standard. Several methods have been developed to help overcome the limitations of these relatively small character sets. The ISO 8859 standard, for example, provides a series of 8-bit extended character sets that use the standard ASCII character set for the first 7 bits and the eighth bit to define another 128 characters, thus extending ASCII to support a variety of additional languages.
This provides a partial solution for programs that need to support only a single language, or a set of languages whose character requirements are very similar. However, implementing the ability to produce many combinations of character sets requires an additional enhancement: the use of switching codes, or escape sequences, that are embedded in the text and indicate the character set of the following characters. This allows the creation of multilingual text, but text features become much more difficult to implement because the program must implement mechanisms that determine the character set to which any given character or range of text belongs.
Providing applications that support Japanese, or other eastern languages that cannot be supported by an 8-bit character set, is even more complicated. In these markets, double-byte and triple-byte character sets are used to define the large number of characters--often tens of thousands--that are required. Typically these languages are encoded with a combination of single- and double-byte codes, such as shift-JIS (JIS is the Japanese Industrial Standard). Programs quickly become much more complex because processing double-byte character data requires very different code than processing single-byte character data.
These are some of the problems the Unicode standard eliminates. The Unicode standard provides a built-in solution because it contains codes for virtually all of the characters needed to support all major writing systems, in any combination. Every character is encapsulated as a 16-bit unsigned integer, so there is no need to write different code to deal with both single-byte and double-byte data. Perhaps more importantly, the …