Re: 8-bit encodings and ASCII (was: Unicode conformant character encodings and us-ascii)

From: Doug Ewell (dewell@adelphia.net)
Date: Sat May 17 2003 - 19:17:07 EDT

  • Next message: Philippe Verdy: "Re: Proposed Update of UTS #10: Unicode Collation Algorithm"

    Stefan Persson <alsjebegrijptwatikbedoel at yahoo dot se> wrote:

    >> If somebody was converting a file from a DOS code page into Unicode,
    >> would it ever be appropriate to map 0x0D to U+266A [♪] ?
    >
    > Nope. 0x0D is usually not displayed as ♪, only under some very
    > special conditions (e.g. when opening a text file in binary mode
    > using MS-DOS Editor).

    This depends on the particular control character in question.

    0x0D and 0x0A are much more commonly intended as CR and LF than as ♪ and
    ◙. You almost never see the glyphs used *as themselves* in MS-DOS
    environments, only as symbols that are understood to stand for CR and
    LF.

    But glyphs like ▲ ▼ ◄ ► and ↑ ↓ are much more useful than the
    corresponding C0 control characters, and people like me who used to
    write TUI ("text user interface") applications for MS-DOS made frequent
    use of them (and had to use magical printer escape sequences, as
    described by Jim Allan, to get them to print properly).

    Then there are the really tricky cases where both the control function
    and the glyph are useful, and one sometimes has to find a way to use
    both. These include 0x07 (BEL or •), 0x1A (Ctrl+Z or →), and 0x1B (ESC
    or ←).

    In the early days of character sets, these were called "duals." BCDIC
    and early versions of EBCDIC had several code positions that could be
    interpreted in two different ways, depending on whether a "commercial"
    or "scientific" context was intended. This caused great confusion when
    the worlds of science and commerce came together. They originally came
    about because of the limits of 48-character print chains, and they
    lasted for decades because of inertia and compatibility requirements.

    Duals caused the same kind of headaches in the '50s and '60s that
    font-based character set switching caused in the '90s, and we should all
    be grateful not to have to fool around with them any longer.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Sat May 17 2003 - 19:56:52 EDT