Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

From: Paul Keinanen (keinanen@sci.fi)
Date: Tue Feb 20 2001 - 17:54:03 EST

Next message: Kenneth Whistler: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)"
Previous message: Tex Texin: "Re: collations: Czech vs. Croat vs. Slovak"
Maybe in reply to: DougEwell2@cs.com: "Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)"
Next in thread: Kenneth Whistler: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Tue, 20 Feb 2001 10:29:17 -0800 (GMT-0800), Peter_Constable@sil.org
wrote:

>
>On 02/20/2001 11:18:40 AM Tobias Hunger wrote:
>
>>Looks like David was quoting me. I am working on Babylon and wanted to
>make
>>clear that it is not unicode conformant as its API uses 32bit wide
>characters
>>which violates clause 1 of Section 3.1.
>
>This is something that UTC should clean up because C1 is obsolete. In fact,
>UTC just took that action when they met a couple of weeks ago:
>
>[86-M8] Motion: Amend Unicode 3.1 to change the Chapter 3, C1 conformance
>clause to read "A process shall interpret Unicode code units (values) in
>accordance with the Unicode transformation format used." (passed)

While this wording makes it possible to handle any 32 bit character
API implementation as UTF-32, this wording does not make it any easier
to implement it on processors with an exotic word length. Depending
how "process" is defined, but a character API implementation on a 24
bit computer using one word/character could be non-conformant, even if
the 24 bits (or even 21 bit :-) would be more than sufficient to
support the 0 .. 10FFFF range.

While I have not recently seen BCD computers or 24 bit computers, but
at least in digital signal processors (DSP) the 24 bit word length is
common.

It would have been clearer that C1 would only define that code points
in the 0 .. 10FFFF range should be supported, allowing character API
implementations (such as dynamically loadable libraries as separate
products) for processors with exotic word lengths and in a separate
clause say something about the transformation formats.

Paul Keinänen

>
>So, when TUS3.1 is published later this year, you will not have any
>problems with conformance with that version of the Standard. (C1 was really
>obsolete back in version 2.0 when UTF-8 was first adopted into the
>Standard, but it took a while for that to get fixed.)
>
>
>
>- Peter
>
>
>---------------------------------------------------------------------------
>Peter Constable
>
>Non-Roman Script Initiative, SIL International
>7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
>Tel: +1 972 708 7485
>E-mail: <peter_constable@sil.org>
>

Next message: Kenneth Whistler: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)"
Previous message: Tex Texin: "Re: collations: Czech vs. Croat vs. Slovak"
Maybe in reply to: DougEwell2@cs.com: "Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)"
Next in thread: Kenneth Whistler: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT