From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Thu Nov 25 2004 - 21:19:42 CST
The fact is, once you dedicate the top bits in a pipe to some purposes,
you've narrowed the width of the pipe. That's what happened to those
systems that implemented a 7-bit pipe for ASCII by using the top bit for
other purposes.
And everybody seems to agree that when you serialize such an encoding the
'unused' bits indeed do need to be set to 0. 0xFFF0FFFF is *not* the same
as 0x0010FFFF. Only the second example is the correct UTF-32 value for the
largest Unicode code point.
However, even strictly internal use of the lesser number of bits, though
not illegal, or incorrect, can be *unwise*. It limits the ways such a
system can be enabled for other character sets.
Now, while ASCII was something of a minimal character set, Unicode strives
to be universal. The chances of getting burned by limiting your
architecture to the features of a single character set are inversely
proportional to its scope and coverage.
In an ideal world, Unicode would satisfy all needs, present and future, and
you could build systems that can only ever deal with Unicode. And many such
systems are being build and will work quite well. However, there's always a
chance that someday some other coding system(*) may need to be used in
parts of your system, and you may well be happy having kept your plumbing
generically to 32-bit.
Call it engineer's caution, if you will.
A./
This archive was generated by hypermail 2.1.5 : Fri Nov 26 2004 - 17:19:00 CST