Re: Limitation of 0x10FFFF (about UTF-32)

From: Masahiko Maedera (Masahiko_Maedera@notesgw2.lotus.co.jp)
Date: Wed Jul 28 1999 - 02:08:48 EDT


Thank you for your reply

I can classify the Unicode's encodings for the following table.

(These names are for convenience, not formal terms.)
(8bit) (16bit) (32bit) (code range)
UTF-8-limit1 UTF-16-limit1 UCS-4-limit1 0x00000000-0x0000FFFF
UTF-8-limit2 UTF-16-limit2 UCS-4-limit2 0x00000000-0x0010FFFF
UTF-8-limit3 UTF-16-limit3 UCS-4-limit3 0x00000000-0x7FFFFFFF

Now I fill up the above table with formal terms (UCS-? = ISO-10646-UCS-?)
(8bit) (16bit) (32bit) (code range)
(undef.)(*1) UCS-2(*1) (undef.)(*1) 0x00000000-0x0000FFFF
((*1)undef.)(*2) UTF-16 UTF-32 0x00000000-0x0010FFFF
UTF-8 (N/A) UCS-4 0x00000000-0x7FFFFFFF

"(*1)" will be obsolete, so we don't need to take care of them.
But "(*2)" will bring us serious problems (for example, data loss),
unless we define this.
However, I don't have any complaint of UTF-8's encoding itself.
I want to have some signature's name for "(*2)".

Best regards,
  Masahiko.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:50 EDT