Re: ISO 6429 control sequences with non-ASCII CES's

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Mar 13 2007 - 14:22:11 CST

  • Next message: Philippe Verdy: "RE: ISO 6429 control sequences with non-ASCII CES's"

    On 3/12/2007 11:33 PM, Doug Ewell wrote:
    > Kenneth Whistler <kenw at sybase dot com> wrote:
    >
    >> "For example the escape sequence "ESC 02/00 04/00" is represented by
    >> "001B 0020 0040" in the two-octet form, and "0000 001B 0000 0020
    >> 0000 0040" in the four-octet form."
    >>
    >> Got it? So the ISO 6429 codes and escape sequences clearly work for
    >> UTF-8, UTF-16, and UTF-32. But you have to take into account the
    >> padding requirement for UTF-16 and UTF-32.
    >>
    >> For UTF-7 and SCSU, on the other hand -- those are not encoding forms
    >> in the sense recognized by the Unicode Standard or 10646. And if you
    >> feed an ESC sequence into them, it can get mangled into a form not
    >> recognizable. You would need to convert back out to an encoding form
    >> to recognize an ESC sequence, if you had one embedded.
    >
    > OK, so to take an example, the clear-screen sequence <Esc>[2J could be
    > conformantly encoded as
    >
    > UTF-16LE: 1B 00 5B 00 32 00 4A 00
    >
    > but not necessarily as
    >
    > UTF-7: 2B 41 42 73 5B 32 4A
    > SCSU: 01 1B 5B 32 4A
    >
    > Would it be non-conformant to interpret the last two sequences
    > according to ISO 6429, or is it simply not required?
    I disagree with Ken's analysis. The specification is clear that

    a) in Unicode, sequences of control "bytes" according to 6429 are
    uniform ally
        represented as sequences of control characters

    b) the same applies to byte from control sequences that map to regular
    characters in
        8-bit encodings, these map to Unicode characters in the range 0000-00FF.

    so far this is a restatement, I hope without change in substance, of
    what Ken wrote, here's where I differ, I think. When you apply specific
    encodings, then

    c) control sequences represented by Unicode character sequence get
    transformed
         into bytes according to the encoding scheme used in the same manner as
         regular character sequences.

    This is true for the standard encoding forms, as well as for all the
    variants, down to binhex applied to Unicode text. Therefore,

    d) to recover a control sequence, steps c, b, and a have to be applied
    in reverse.

    A./
    >
    > --
    > Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
    > http://users.adelphia.net/~dewell/
    > http://www1.ietf.org/html.charters/ltru-charter.html
    > http://www.alvestrand.no/mailman/listinfo/ietf-languages
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Mar 13 2007 - 14:26:12 CST