Re: Surrogate points

From: Markus Scherer (
Date: Tue Jan 25 2005 - 18:56:09 CST

  • Next message: Christopher Fynn: "Re: New Balinese proposal available."

    Hans Aberg wrote:
    > Should not the in effect empty Unicode points, U+D800 to U+DFFF, as well as
    > U+FFFE and U+FFFF, be filled with characters? The current construction gives
    > a misleading impression that the Unicode character set and character
    > numbering have anything to do with the encoding UTF-16.

    You are trying to rewrite history... these code points are designated as they are, and changing them
    would wreak havoc. Even though they don't have _characters_ assigned, they do have designations (as
    surrogates resp. non-characters), and software assumes these designations.

    > One might also design a new set of encodings for k-bit words, ...

    With no advantage over what's available, and widely implemented, I don't see this getting anywhere.

    > Call, ad hoc, this encoding UE-k. Then UE-16 has the capacity of holding 27
    > bits in a two-word. UE-8 is the same as UTF-8. And UTF-32 is the same as
    > UE-32.

    No one needs, or wants, more than 20.1 bits for Unicode code points.


    Opinions expressed here may not reflect my company's positions unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Tue Jan 25 2005 - 19:03:17 CST