Re: Subject: Re: 32'nd bit & UTF-8

From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 18:52:37 CST

  • Next message: Hans Aberg: "Re: UTF-8 'BOM'"

    On 2005/01/20 20:16, Richard T. Gillam at rgillam@las-inc.com wrote:

    > Good grief. We seem to be going through another round of "night of the
    > living thread."

    Have you found out first now. :-)

    > This discussion has degenerated horribly from its original roots. I
    > really don't think it's productive to get into "UTF-x is better than
    > UTF-y" battles here, and I wish we could find a way to put a stop to it.
    > It's like programming-language wars-- people go round and round and
    > there's never any resolution. Suffice it to say that there are good,
    > valid reasons to use each of the UTFs and good, valid reasons not to use
    > each of the UTFs. Depending on your particular situation, any of the
    > three might be the best fit. There's a reason all three exist.

    At least for now. UTF-16 cannot be extended beyond the current range, but
    UTF-8/32 can both be extended to 2^32 numbers, the size of a natural type.
    Even though UTF-16 has a distinct legacy advantage, it likely does not have
    that in the long run. So deprecating it seems to be a distinct possibility.

    > Similarly, I don't think endless discussion of the BOM is productive. I
    > think most people would agree that out-of-band methods of specifying the
    > encoding scheme are preferable to using the BOM, but they're not always
    > available. The BOM is what it is and it's not going away.

    Well, in UTF-8 it has to go away as a requirement to be ignored in
    processes: Either Unicode removes it in the standard, or one will see that
    people just don't bother following the Unicode standard in that respect.

    >It would
    > have been better if the BOM hadn't been overloaded with the "zero-width
    > non-breaking space" semantic-- if it had just been considered a no-op--

    This is a suggestion I once made about Unicode file/stream contents markers
    that do not have any other semantics. But that suggestion was turned down,
    as contrary to the Unicode spirit. Then the use of BOM's just show how
    dangerous it is for Unicode to neglect the users needs and concerns: The
    needed features will simply appear anyway, but outside Unicode then. And if
    Unicode tries to patch it up, poor constructions, such as the BOM one, will
    appear.

      Hans Aberg



    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 18:54:34 CST