Re: ISO 6429 control sequences with non-ASCII CES's

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Mar 12 2007 - 14:30:10 CST

  • Next message: Doug Ewell: "Re: ISO 6429 control sequences with non-ASCII CES's"

    > ISO 6429 (equivalently ECMA 48, ANSI X3.64) defines terminal control
    > sequences using the control characters in the U+0000 - U+001F block.
    > Many control sequences begin with Escape (U+001B) and also include other
    > characters in the printable Basic Latin block.
    >
    > I get the impression from reading ECMA 48 that these control sequences
    > are defined directly on byte values, not character values. That means
    > they could not be used with Unicode character encoding schemes such as
    > UTF-16,

    From ISO/IEC 10646:

    "When a control character of ISO/IEC 6429 is used with this coded
    character set, its coded representaiton as specified in ISO/IEC 6429
    shall be padded to correspond with the number of octets in the
    adopted form...

    "For example, the control character FORM FEED is represented by
    "000C" in the two-octet form, and "0000 000C" in the four-octet form.

    ...

    "For example the escape sequence "ESC 02/00 04/00" is represented
    by "001B 0020 0040" in the two-octet form, and "0000 001B 0000 0020
    0000 0040" in the four-octet form."

    Got it? So the ISO 6429 codes and escape sequences clearly work
    for UTF-8, UTF-16, and UTF-32. But you have to take into account
    the padding requirement for UTF-16 and UTF-32.

    > UTF-7, or SCSU, which represent U+001B as something other than
    > the single byte 0x1B.

    For UTF-7 and SCSU, on the other hand -- those are not encoding
    forms in the sense recognized by the Unicode Standard or
    10646. And if you feed an ESC sequence into them, it can get
    mangled into a form not recognizable. You would need to convert
    back out to an encoding form to recognize an ESC sequence, if
    you had one embedded.

    --Ken

    > It also means they *could* be used with UTF-8.
    > Is this correct?
    >
    > --
    > Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
    > http://users.adelphia.net/~dewell/
    > http://www1.ietf.org/html.charters/ltru-charter.html
    > http://www.alvestrand.no/mailman/listinfo/ietf-languages
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Mar 12 2007 - 14:32:39 CST