From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Jan 25 2005 - 18:56:09 CST
Hans Aberg wrote:
> Should not the in effect empty Unicode points, U+D800 to U+DFFF, as well as
> U+FFFE and U+FFFF, be filled with characters? The current construction gives
> a misleading impression that the Unicode character set and character
> numbering have anything to do with the encoding UTF-16.
You are trying to rewrite history... these code points are designated as they are, and changing them
would wreak havoc. Even though they don't have _characters_ assigned, they do have designations (as
surrogates resp. non-characters), and software assumes these designations.
> One might also design a new set of encodings for k-bit words, ...
With no advantage over what's available, and widely implemented, I don't see this getting anywhere.
> Call, ad hoc, this encoding UE-k. Then UE-16 has the capacity of holding 27
> bits in a two-word. UE-8 is the same as UTF-8. And UTF-32 is the same as
> UE-32.
No one needs, or wants, more than 20.1 bits for Unicode code points.
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Tue Jan 25 2005 - 19:03:17 CST