Re: Misuse of 8th bit [Was: My Querry]

From: Asmus Freytag (
Date: Thu Nov 25 2004 - 21:19:42 CST

  • Next message: John Hudson: "Re: No Invisible Character - NBSP at the start of a word"

    The fact is, once you dedicate the top bits in a pipe to some purposes,
    you've narrowed the width of the pipe. That's what happened to those
    systems that implemented a 7-bit pipe for ASCII by using the top bit for
    other purposes.

    And everybody seems to agree that when you serialize such an encoding the
    'unused' bits indeed do need to be set to 0. 0xFFF0FFFF is *not* the same
    as 0x0010FFFF. Only the second example is the correct UTF-32 value for the
    largest Unicode code point.

    However, even strictly internal use of the lesser number of bits, though
    not illegal, or incorrect, can be *unwise*. It limits the ways such a
    system can be enabled for other character sets.

    Now, while ASCII was something of a minimal character set, Unicode strives
    to be universal. The chances of getting burned by limiting your
    architecture to the features of a single character set are inversely
    proportional to its scope and coverage.

    In an ideal world, Unicode would satisfy all needs, present and future, and
    you could build systems that can only ever deal with Unicode. And many such
    systems are being build and will work quite well. However, there's always a
    chance that someday some other coding system(*) may need to be used in
    parts of your system, and you may well be happy having kept your plumbing
    generically to 32-bit.

    Call it engineer's caution, if you will.


    This archive was generated by hypermail 2.1.5 : Fri Nov 26 2004 - 17:19:00 CST