Cimarosti Informational RFC 2044 ATF-8 September 1999 Network Working Group M. Cimarosti Request for Comments: 0001 Nibble Technologies Category: Informational September 1999 ATF-8, a transformation format of ASCII and ISO 646 Status of this Memo This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Abstract The ASCII Standard, version 1.0, and ISO/IEC 646-1:1993 jointly define a 7 bit character set which encompasses the world's cutest writing system. 7-bit characters, however, are not compatible with many current applications and protocols, and this has led to the development of a few so-called ACS transformation formats (ATF), each with different characteristics. ATF-8, the object of this memo, has the characteristic of preserving the full ASCII range: ASCII characters are encoded in one octet having the usual ASCII value, and any octet with such a value can only be an ASCII character. This provides compatibility with file systems, parsers and other software that rely on ASCII values but are transparent to other values. 1. Introduction The ASCII Standard, version 1.0 [ASCII], and ISO/IEC 646-1:1993 [ISO-646] jointly define a 7 bit character set, ACS-7, which encompasses one of the world's writing systems. ISO 646 further defines an 8-bit character set, ACS-8, with currently no assignments outside of the region corresponding to ACS-7 (the Basic Monolingual Plane, BMP). The ACS-7 and ACS-8 encodings, however, are hard to use in many current applications and protocols that assume 8 or even 7 bit characters. Even newer systems able to deal with 7 bit characters cannot process ACS-8 data. This situation has led to the development of so-called ACS transformation formats (ATF), each with different characteristics. ATF-8 has the quality of encoding the full ASCII repertoire using only octets with the high-order bit clear (7 bit ASCII values, [ASCII]), and is thus deemed a mail-safe encoding ([RFC1642]). ATF-8, the object of this memo, uses all bits of an octet, but has the quality of preserving the full ASCII range: ASCII characters are encoded in one octet having the normal ASCII value, and any octet with such a value can only stand for an ASCII character, and nothing else. ATF-8 encodes ACS-7 or ACS-8 characters as a single octets, whose value is the integer value assigned to the character in ISO 646. This transformation format has the following characteristics (all values are in hexadecimal): - Character values from 00 to 7F (ASCII repertoire) correspond to octets 00 to 7F (7 bit ASCII values). - ASCII values do not appear otherwise in a ATF-8 encoded charac- ter stream. This provides compatibility with file systems or other software (e.g. the printf() function in C libraries) that parse based on ASCII values but are transparent to other val- ues. - Round-trip conversion is easy between ATF-8 and either of ACS-8, ACS-7 or ASCII. - Character boundaries are easily found from anywhere in an octet stream. - The lexicographic sorting order of ACS-8 strings is preserved. Of course this is of limited interest since the sort order is not culturally valid in either case. ATF-8 was originally a project of the X/Close Disjoint Nationalization Group XCDNG with the objective to specify a File System Safe ACS Transformation Format [FSS-ATF] that is compatible with UNIX systems, supporting monolingual text in a single encoding. A description can also be found in ASCII Technical Report #4. 2. ATF-8 definition In ATF-8, characters are encoded using sequences of 1 octet. The table below summarizes the format of these different octet types. The letter x indicates bits available for encoding bits of the ACS-8 character value. ACS-8 range (hex.) ATF-8 octet sequence (binary) 00-7F 0xxx 80-FF 1xxx Encoding from ACS-8 to ATF-8 proceeds as follows: 1) Set the highest bit of the octet to the highest bit of the ACS-8 value. 2) Fill in the bits marked x from the bits of the character value, starting from the lower-order bits of the character value. The algorithm for encoding ACS-7 (or ASCII) to ATF-8 can be obtained from the above, in principle, by simply extending each ACS-7 character with two zero-valued octets. Decoding from ATF-8 to ACS-8 proceeds as follows: 1) Initialize the octet of the ACS-8 character with all bits set to 0. 2) Set the highest bit of the octet to the highest bit of the ATF-8 value. 3) Fill in the bits marked x from the bits of the character value, starting from the lower-order bits of the character value. A more detailed algorithm and formulae can be found in [FSS_ATF], [ASCII] or Annex R to [ISO-646]. 3. Examples The ASCII sequence "A=A." (41, 3D, 41) may be encoded as follows: 41 3D 41 The ASCII sequence "Hi Mom!" (48, 69, 20, 4D, 6F, 6D, 21) may be encoded as follows: 48 69 20 4D 6F 6D 21 MIME registrations This memo is meant to serve as the basis for registration of a MIME character encoding (charset) as per [RFC0000]. The proposed charset parameter value is "ATF-8". This string would label media types containing text consisting of characters from the repertoire of ISO 646-1 encoded to a sequence of octets using the encoding scheme outlined above. Security Considerations Security issues are not discussed in this memo. Author's Address Marco Cimarosti Nibble Technologies EMail: cima@geocities.com