Re: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode

From: Asmus Freytag (
Date: Fri Mar 04 2005 - 20:21:19 CST

  • Next message: James Kass: "Re: But E0000 Custom Language Tags Are Actually *Required* For Use By Unicode"

    At 06:06 PM 3/4/2005, wrote:
    >Peter Kirk wrote:
    > > But in effect, in the current situation these two
    > > situations can be dealt with only by the same mechanism: choose a
    > > suitable font.
    >Which leaves me wondering what all the Unicode work has been for -- since
    >that's hardly different from 8-bit fonts. With all this "higher level
    >formatting" why not just have "encoding" as one of those higher-level tags?

    Display is not the only thing.

    With 8-bit sets, you have near-random permutations for the association of
    character to code. Creating and sorting mixed list of items is no longer
    possible, and once character set tags are lost, the whole document turns
    into garbage.

    With Unicode, your display may be less than optimal, but is usually
    recognizable, even when language information from a higher level protocol
    was lost.

    In fact, automatic language recognition software could (and is) easily used
    to guide font selection, when faced with unannotated plain text. Finally, a
    stream of Unicode values would allow a human user (with good tools for
    inspection and repair) to investigate the *content* and restore as much of
    the formatting as necessary to make the text fully legible. In the general
    case that's nearly impossible for 8-bit sets.

    Your example (deleted in this reply) would be similar to claiming that
    italics should be done with an 8-bit character set, because they require
    out of band style information. Not even 8-bit sets when at the peak of
    their field did anything like that.


    This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 20:22:08 CST