Re: Surrogate points

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Jan 25 2005 - 18:56:09 CST

Next message: Christopher Fynn: "Re: New Balinese proposal available."

Previous message: Hans Aberg: "Re: <<NONCHAR>> for flex"
In reply to: Hans Aberg: "Surrogate points"
Next in thread: Hans Aberg: "Re: Surrogate points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hans Aberg wrote:
> Should not the in effect empty Unicode points, U+D800 to U+DFFF, as well as
> U+FFFE and U+FFFF, be filled with characters? The current construction gives
> a misleading impression that the Unicode character set and character
> numbering have anything to do with the encoding UTF-16.

You are trying to rewrite history... these code points are designated as they are, and changing them
would wreak havoc. Even though they don't have _characters_ assigned, they do have designations (as
surrogates resp. non-characters), and software assumes these designations.

> One might also design a new set of encodings for k-bit words, ...

With no advantage over what's available, and widely implemented, I don't see this getting anywhere.

> Call, ad hoc, this encoding UE-k. Then UE-16 has the capacity of holding 27
> bits in a two-word. UE-8 is the same as UTF-8. And UTF-32 is the same as
> UE-32.

No one needs, or wants, more than 20.1 bits for Unicode code points.

markus

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Next message: Christopher Fynn: "Re: New Balinese proposal available."
Previous message: Hans Aberg: "Re: <<NONCHAR>> for flex"
In reply to: Hans Aberg: "Surrogate points"
Next in thread: Hans Aberg: "Re: Surrogate points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 25 2005 - 19:03:17 CST