RE: proposal for a "Standard-Exit" or "Namespace" character

From: Tex Texin (textexin@xencraft.com)
Date: Mon Apr 13 2009 - 19:46:17 CDT

  • Next message: Dennis Heuer: "Re: proposal for a "Standard-Exit" or "Namespace" character"

    More importantly there are international standards for formatting text
    called markup languages, that are much more powerful than a small set of
    control character commands would be, and which coexist quite well with
    Unicode encoding and are portable.
    Given the existence and wide support for HTML, etc., the case for such
    commands in Unicode is extremely weak.

    It would also create problems to now have commands in Unicode which would
    potentially interact or conflict with higher level formatting commands.

    tex

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
    Behalf Of Kenneth Whistler
    Sent: Monday, April 13, 2009 4:44 PM
    To: dh@triple-media.com
    Cc: unicode@unicode.org; kenw@sybase.com
    Subject: Re: proposal for a "Standard-Exit" or "Namespace" character

    Dennie Heuer suggested:

    > ... this is why i think that unicode should support the inclusion (or
    > embedding) of other character sets. it should not know about them and
    > how to specify them. this is the matter of a different standard.

    Ah, but that standard already exists. You are reinventing
    ISO 2022:

    http://en.wikipedia.org/wiki/ISO_2022

    > (the
    > easiest way is to name or number them offcially.)

    And that also exists. It is called the International Register of Coded
    Character Sets to be Used with Escape Sequences:

    http://www.itscj.ipsj.or.jp/ISO-IR/

    > however, it should
    > provide a character to mark the position at which unicode is 'closed'
    > or 'left'.

    And that is defined by ISO/IEC 10646 itself:

    "When the escape sequences from ISO/IEC 2022 are used, the
    identification of a return, or transfer, from UCS to the
    coding system of ISO/IEC 2022 shall be by the escape sequence
    ESC 02/05 04/00. ..."

    So the escape sequence <U+001B, U+0025, U+0040> gets you
    from Unicode to ISO 2022, if you want to embed other
    character sets using the mechanisms of that standard.

    A warning though: ISO 2022 is basically an implementation flop, outside
    of the limited context in which it is used for character sets
    supported in East Asian email contexts: ISO-2022-JP,
    ISO-2022-CN, etc.

    And I rather doubt that turning an escape sequence (which at
    least has the advantage of being a widely understood and
    somewhat implemented mechanism) into a single character exit
    code would change anything -- you still end up with a stateful
    encoding of the very type that Unicode was invented to get
    away from.

    --Ken

    P.S. If you *really* want a single character exit code, that
    *also* exists already: U+000E SHIFT OUT. But no Unicode
    systems implement that as a character set exit control code,
    for good reasons.



    This archive was generated by hypermail 2.1.5 : Mon Apr 13 2009 - 19:48:33 CDT