Re: ISO 2022

From: John Cowan (cowan@drv.cbc.com)
Date: Wed Oct 22 1997 - 17:50:00 EDT


With six requests for my text in a few hours, I have decided to
risk the Wrath of Svarasati and post it here. Please note -- as
I forgot to mention last time -- that replies must be sent to
cowan@ccil.org; direct responses will reach me but are unreplyable-to
by me (the From: header gets corrupted).

** start here **

ISO 2022 text is specified using a mixture of registered character sets.
At any time, up to four character sets may be available. Character sets
have one of three sizes: single-byte character sets with 94 characters
(e.g. ASCII), single-byte character sets with 96 characters (e.g. the top
halves of ISO Latin-1 to Latin-5), or double-byte character sets with
94 x 94 characters (e.g. JIS 0208X-1983). Each registered character set has
a standard designating byte in the range 48 to 125; the bytes are unique within
character set sizes, but may be reused across sizes. For example, byte 66
designates the 94-character set ASCII, the 96-character set ISO Latin-2 (top
half), and the 94 x 94 Japanese character set JIS 0208X-1983. In this
document, the designating byte of a character set will be represented by <D>.

The four available character sets are labeled G0, G1, G2, and G3. Initially,
G0 is the 94-character set ASCII, and G1 is the 96-character set ISO Latin-1
(top half). The other character sets are unassigned. The following escape
sequences (where ESC = the byte 27) specify changes to the available
character sets:

        ESC ( <D> Set G0 to the 94-character set <D>
        ESC ) <D> Set G1 to the 94-character set <D>
        ESC * <D> Set G2 to the 94-character set <D>
        ESC + <D> Set G3 to the 94-character set <D>
        ESC - <D> Set G1 to the 96-character set <D>
        ESC . <D> Set G2 to the 96-character set <D>
        ESC / <D> Set G3 to the 96-character set <D>
        ESC $ <D> Set G0 to the 94 x 94 character set <D>
        ESC $ ( <D> Set G0 to the 94 x 94 character set <D>
        ESC $ ) <D> Set G1 to the 94 x 94 character set <D>
        ESC $ * <D> Set G2 to the 94 x 94 character set <D>
        ESC $ + <D> Set G3 to the 94 x 94 character set <D>

Note that G0 may not be a 96-character set, and that there are two ways to
specify a 94 x 94 character set in G0, of which the first is deprecated.

ISO 2022 decoding affects input bytes in the ranges 33 to 126 and 160 to 255,
known as "the left half" and "the right half" respectively. All other bytes,
unless they belong to a control sequence shown in this document, remain
unchanged. Initially, the left half is interpreted as character set G0,
and the right half as character set G1. This can be changed by the following
control sequences:

        SI (byte 15) Interpret the left half as G1 characters
        SO (byte 14) Interpret the left half as G0 characters
        ESC n Interpret the left half as G2 characters
        ESC o Interpret the left half as G3 characters
        ESC ~ Interpret the right half as G1 characters
        ESC } Interpret the right half as G2 characters
        ESC | Interpret the right half as G3 characters
        SS2 (byte 142) Interpret next character only as G2
        ESC N Interpret next character only as G2
        SS3 (byte 143) Interpret next character only as G3
        ESC O Interpret next character only as G3

This rich schema may be used in various ways. In ISO-2022-JP, the Japanese
flavor of ISO 2022, only the bytes 33-126 and the G0 character set is used,
and escape sequences are used to switch between ASCII, ISO-646-JP (the
Japanese national variant of ASCII), and JIS 0208X-1983. In other versions,
the G1 character set has 94 x 94 size, and so any byte in the range 160-255
is automatically the first byte of a double-byte character.

** end of text **

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
			e'osai ko sarji la lojban



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT