> There's no updating needed. The key is that The Unicode Standard, Version
> 3.0 recognizes UTF-16 as the default encoding. Therefore code values (or
> units) which are defined as 'minimal bit combination that can represent a
> unit of encoded text' are 16-bit. In UTF-16, one sometimes needs two of
> these, instead of just one.
>>| C1 says "A process shall interpret Unicode code values as 16-bit
>> This I find mightily confusing. Why say something like this when
>> there are (well, will be) characters that cannot be represented with
>> 16 bits in any of the Unicode encodings?
> because the smallest unit of UTF-16 (which can represent characters outside
> the first 64K) is 16-bit. See the full text of definition D5 on page 41.
The confusion is that a 16-bit unit is referred to as a character code,
but it is not. It's a character element code (to my way of thinking),
and one can construct a character code from one or more character
element codes. It's sort of, semi-atomic, only not, i.e. not unitary
and complete unto itself. And the contextual business muddies it,
It just so happens that most character codes have a single element,
but the (necessary?) inconsistency complicates matters in precisely
the ways I'd been hoping Unicode would simplify. Well, it does
simplify it; just not as far as one would wish.
Dum spiro, spero.
John G. Otto Nisus Software, Engineering
www.infoclick.com www.mathhelp.com www.nisus.com software4usa.com
EasyAlarms PowerSleuth NisusEMail NisusWriter MailKeeper QUED/M
My opinions are probably not those of Nisus Software, Inc.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT