RE: ISO 6429 control sequences with non-ASCII CES's

From: Philippe Verdy (
Date: Wed Mar 14 2007 - 13:00:33 CST

  • Next message: Frank Ellermann: "Re: ISO 6429 control sequences with non-ASCII CES's"

    > De : [] De la
    > part de Doug Ewell
    > Envoyé : mercredi 14 mars 2007 06:54
    > À : Unicode Mailing List
    > Cc :
    > Objet : Re: ISO 6429 control sequences with non-ASCII CES's
    > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    > > It would be better to have in Unicode some special ranges of control
    > > characters mapped to byte values that are part of unconverted CES
    > > sequences like in VT100, VT200 (and so on) protocols, or in other
    > > legacy terminal protocols (to encode colors, cursor control, or other
    > > rich text enhancements, or the encoding of user-defined bitmaps for
    > > custom characters or glyphs, notably used in some East-Asian Teletext
    > > systems, because trying to detect which character those bitmaps
    > > represent can be difficult, or even impossible, as they were really
    > > user-defined and local to the document containing those glyph
    > > definitions).
    > On no account do I wish to replace ISO 6429 with some other mechanism,
    > or introduce new coded characters to assist ISO 6429 handling. It
    > wouldn't work as intended anyway, and the experience with Plan 14 tags
    > shows how reluctant UTC would be to invent such "special control
    > characters."
    > I'm only asking for layer clarifications between ISO 6429 and various
    > Unicode/10646 representations.
    > > Consider sequences like:
    > > ESC, [, A, I, R
    > > (in a 7-bit or 8-bit encoded document prepared and sent on medias that
    > > support with VT100-like enhancement).
    > >
    > > Or even this one with Videotex:
    > > ESC, A, I, R
    > >
    > > Do they c ontain the English word "AIR" or the abbreviation "IR"
    > > (preceded by a ANSI/VT100-like color attribute)? How can we delimit
    > > the length of escape sequences?
    > That's easy if you've read ISO 6429 or the equivalent ECMA or ANSI
    > standard:
    > 1B [30-3F]* [20-2F]* [40-7E]
    > The syntax is well-defined and unambiguous. In your first example, the
    > sequence ends with the letter A, and would move the cursor up one row
    > (if possible) before printing the letters "IR".

    I did not say that it could not be done. But only if you know exactly which
    escape encoding scheme is used; in Plan-texts, there's nothing that says
    that ECMA or ANSI or another teletext system is used. And between these
    standards, the decoding rules are NOT the same, with the final effect that
    espace sequences are NOT terminated at the same position, given the same
    input (not all these systems terminate the esacpae sequence on the first
    occurrence of a character in [40-7E].

    In fact, sequences can be longer... especially for escape sequences for
    user-defined character glyphs encoded as embedded bitmaps. The effective
    sequence length is then determined by something else, like the explicit
    encoding of the length of variable parameters.

    This archive was generated by hypermail 2.1.5 : Wed Mar 14 2007 - 13:02:56 CST