Re: Unicode FAQ addendum

From: jgo (
Date: Fri Jul 21 2000 - 14:55:13 EDT

> There's no updating needed. The key is that The Unicode Standard, Version
> 3.0 recognizes UTF-16 as the default encoding. Therefore code values (or
> units) which are defined as 'minimal bit combination that can represent a
> unit of encoded text' are 16-bit. In UTF-16, one sometimes needs two of
> these, instead of just one.

>>| C1 says "A process shall interpret Unicode code values as 16-bit
>>| quantities."

>> This I find mightily confusing. Why say something like this when
>> there are (well, will be) characters that cannot be represented with
>> 16 bits in any of the Unicode encodings?

> because the smallest unit of UTF-16 (which can represent characters outside
> the first 64K) is 16-bit. See the full text of definition D5 on page 41.

The confusion is that a 16-bit unit is referred to as a character code,
but it is not. It's a character element code (to my way of thinking),
and one can construct a character code from one or more character
element codes. It's sort of, semi-atomic, only not, i.e. not unitary
and complete unto itself. And the contextual business muddies it,
as well.

It just so happens that most character codes have a single element,
but the (necessary?) inconsistency complicates matters in precisely
the ways I'd been hoping Unicode would simplify. Well, it does
simplify it; just not as far as one would wish.

Dum spiro, spero.

John G. Otto Nisus Software, Engineering
EasyAlarms PowerSleuth NisusEMail NisusWriter MailKeeper QUED/M
   My opinions are probably not those of Nisus Software, Inc.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT