RE: ISO 6429 control sequences with non-ASCII CES's

From: Philippe Verdy (
Date: Wed Mar 14 2007 - 07:12:26 CST

  • Next message: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.3)"

    > De : Kenneth Whistler []
    > > Another option would be to encode only two new controls in Unicode:
    > > * start control sequence;
    > > * end control sequence.
    > No. A very bad idea, IMO.
    > If you want to write ISO 2022-conformant code that makes use
    > of registered Escape sequences, then write ISO 2022-conformant
    > code to do so, and have it detect the registered Escape sequences
    > corresponding to the character set identifications (or any
    > other other pertinent usages of Escape sequences) it is concerned
    > with. That is what ISO 2022 is all about.

    I did not refer to ISO 2022 in my message but to the case of many CES
    without ISO standards and used nationally, or in proprietary protocols.

    For example DVB subtitles and EPG, proprietary MPEG title extensions,
    Videotext and Teletext... VT100 terminal protocols and similar.
    In those cases, the sequences are NOT encoding characters, but attributes,
    they don't qualify as regular CES because they are not encoding characters
    and not not mappable to single Unicode characters.

    Without a clear identification of those sequences, this causes problems
    because they are still used for storing documents, but still may need
    general Unicode algorithms, for example for full-text searches (like in
    desktop search engines).

    If transmitting those documents over Internet, they may eventually be tagged
    with a specific MIME file type (not "text/plain"), but in reality they are
    more than that and do also qualify as "application:*" formats; in fact they
    are working in the OSI model at the presentation layer, not at the encoding
    or transport layer (so they are neither CES or TES).

    Although many of you don't know exactly the details of European videotext or
    Teletext systems (or DVB subtitles), most of you are exposed to VT100-like
    presentation formats in their text terminals or OS consoles (including
    Windows "OEM" console with "ANSI" extensions, or VT100 emulators, or X11
    consoles in Unix/Linux)

    Still now, we have no clear pattern for identifying presentation protocols
    used in terminal sessions, because we only identify CES (=codepages in
    DOS/Windows environments), and because the terminal presentation
    environments are orthogonal and most often completely independent from the
    CES (=codepage) environment.

    The case of ISO 2022 sequences is much more clear, as they are unambiguously
    used as CES sequences.

    This archive was generated by hypermail 2.1.5 : Wed Mar 14 2007 - 07:16:13 CST