Re: Unicode FAQ addendum

From: addison@inter-locale.com
Date: Fri Jul 21 2000 - 07:48:27 EDT


Of course, we could just replace the (now slightly confusing) statement
with the old I18N mantra (also slightly modified):

1. 1 byte != 1 character: deal with it.

Addison

> I think the focus here was supposed to be on the fact that Unicode code
> values are *not 8-bit* quantities. I found out about Unicode in late
> 1991 when I discovered a copy of TUS 1.0 in a bookstore, and for years
> afterward, when I read an article about Unicode there was sure to be
> some angle presented that Unicode "broke" the C-language string model
> by including "nulls," or zero bytes, in the character stream. Users of
> single-byte and even multi-byte character sets had to overcome a major
> mental block by realizing that the 16-bit word, not the byte, was the
> atomic code unit. That was the biggest revolution in Unicode, and that
> is probably why it was made Conformance Requirement Number One.
>
> That said, I agree that the distinction between "code value" and
> "character value," where the first still fits the 16-bit model but the
> second does not, may be technically correct but feels a little like
> wordplay. I wonder if the original intent (WORDS, not BYTES!) could
> be preserved and the scalar value vs. code unit distinction addressed
> at the same time. How about this:
>
> "A process shall interpret Unicode character values as sequences of one
> or two 16-bit quantities."
>
> -Doug Ewell
> Fullerton, California
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT