Re: Surrogate points

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Jan 25 2005 - 18:56:09 CST

  • Next message: Christopher Fynn: "Re: New Balinese proposal available."

    Hans Aberg wrote:
    > Should not the in effect empty Unicode points, U+D800 to U+DFFF, as well as
    > U+FFFE and U+FFFF, be filled with characters? The current construction gives
    > a misleading impression that the Unicode character set and character
    > numbering have anything to do with the encoding UTF-16.

    You are trying to rewrite history... these code points are designated as they are, and changing them
    would wreak havoc. Even though they don't have _characters_ assigned, they do have designations (as
    surrogates resp. non-characters), and software assumes these designations.

    > One might also design a new set of encodings for k-bit words, ...

    With no advantage over what's available, and widely implemented, I don't see this getting anywhere.

    > Call, ad hoc, this encoding UE-k. Then UE-16 has the capacity of holding 27
    > bits in a two-word. UE-8 is the same as UTF-8. And UTF-32 is the same as
    > UE-32.

    No one needs, or wants, more than 20.1 bits for Unicode code points.

    markus

    -- 
    Opinions expressed here may not reflect my company's positions unless otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Tue Jan 25 2005 - 19:03:17 CST