Re: Unicode in VFAT file system

Date: Fri Jul 21 2000 - 08:00:21 EDT

On 07/21/2000 04:42:05 AM <> wrote:

>Unicode is the code, which is based on 16 bit chunks of ether or whatever,
>UTF-8 is a biased transformation format...

That's too simple to capture the current reality, as others have been
indicating. The full story is availble in UTR17, and *everybody* on this
list ought to read and digest it - of all the UTRs, it's probably the one
that's most useful to be read by the broadest audience.

In a nutshell, Unicode started life being 16-bit monowidth, but the need to
extend and merge with ISO 10646 made life more complicated. At this point,
there is no real option but to say that Unicode is a 21 (or 20.1) bit*
character set combined with various encoding forms and schemes based on 8,
16 or 32 bit data types.

* The codespace for the encoded character set takes a little explanation.
The simplification is that it's 0 - 10FFFF (which takes 21 bits to
represent but doesn't go as far as 21 bits would allow - that would be
1FFFFF). Actually, you have to remove from this D800 - DFFF and 34 values
that match the pattern nnnnFE and nnnnFF.

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT