Re: ISO 6429 control sequences with non-ASCII CES's

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Mar 12 2007 - 14:30:10 CST

Next message: Doug Ewell: "Re: ISO 6429 control sequences with non-ASCII CES's"

Previous message: Frank Ellermann: "Re: ISO 6429 control sequences with non-ASCII CES's"
Maybe in reply to: Doug Ewell: "ISO 6429 control sequences with non-ASCII CES's"
Next in thread: Doug Ewell: "Re: ISO 6429 control sequences with non-ASCII CES's"
Reply: Doug Ewell: "Re: ISO 6429 control sequences with non-ASCII CES's"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> ISO 6429 (equivalently ECMA 48, ANSI X3.64) defines terminal control
> sequences using the control characters in the U+0000 - U+001F block.
> Many control sequences begin with Escape (U+001B) and also include other
> characters in the printable Basic Latin block.
>
> I get the impression from reading ECMA 48 that these control sequences
> are defined directly on byte values, not character values. That means
> they could not be used with Unicode character encoding schemes such as
> UTF-16,

From ISO/IEC 10646:

"When a control character of ISO/IEC 6429 is used with this coded
character set, its coded representaiton as specified in ISO/IEC 6429
shall be padded to correspond with the number of octets in the
adopted form...

"For example, the control character FORM FEED is represented by
"000C" in the two-octet form, and "0000 000C" in the four-octet form.

...

"For example the escape sequence "ESC 02/00 04/00" is represented
by "001B 0020 0040" in the two-octet form, and "0000 001B 0000 0020
0000 0040" in the four-octet form."

Got it? So the ISO 6429 codes and escape sequences clearly work
for UTF-8, UTF-16, and UTF-32. But you have to take into account
the padding requirement for UTF-16 and UTF-32.

> UTF-7, or SCSU, which represent U+001B as something other than
> the single byte 0x1B.

For UTF-7 and SCSU, on the other hand -- those are not encoding
forms in the sense recognized by the Unicode Standard or
10646. And if you feed an ESC sequence into them, it can get
mangled into a form not recognizable. You would need to convert
back out to an encoding form to recognize an ESC sequence, if
you had one embedded.

--Ken

> It also means they *could* be used with UTF-8.
> Is this correct?
>
> --
> Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
> http://users.adelphia.net/~dewell/
> http://www1.ietf.org/html.charters/ltru-charter.html
> http://www.alvestrand.no/mailman/listinfo/ietf-languages
>
>
>

Next message: Doug Ewell: "Re: ISO 6429 control sequences with non-ASCII CES's"
Previous message: Frank Ellermann: "Re: ISO 6429 control sequences with non-ASCII CES's"
Maybe in reply to: Doug Ewell: "ISO 6429 control sequences with non-ASCII CES's"
Next in thread: Doug Ewell: "Re: ISO 6429 control sequences with non-ASCII CES's"
Reply: Doug Ewell: "Re: ISO 6429 control sequences with non-ASCII CES's"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Mar 12 2007 - 14:32:39 CST