Re: Surrogate points

From: Hans Aberg (
Date: Mon Jan 31 2005 - 12:24:23 CST

  • Next message: Chris Jacobs: "Re: Arabic and HTML"

    At 11:09 +0000 2005/01/31, Peter Kirk wrote:
    >Doug, I think you have missed Hans' point, ...


    >...which is surely that if
    >Unicode had been designed from the start as a 21-bit space or whatever,
    >it is unlikely that this surrogate pair mechanism would have been used
    >to encode characters beyond the first 64K, and there would not have
    >been a need to reserve this large block of code points.

    Well, you are closing in. There is still no need of reserving the
    "surrogate" and 0xFFFE-0xFFFF points, even in the face of the UTF-16: Just
    put them somewhere else in a modified UTF-16. As nobody expects all the
    UTF-16 range to be covered by Unicode character numbers with a good margin,
    just put them somewhere where expected to be free.

    >So, Hans, all of this is theoretical as Doug has made clear. Even if we
    >can all agree post facto on an improved encoding, there is far too much
    >investment in UTF-16 for it ever to be changed. And UTF-16, which cannot
    >be deprecated, requires these code points to be reserved. But there is
    >no shortage of code points, so what's the problem?

    A relatively minor change to the UTF-16 would make that condition to go
    away. The current UTF-16 implementations would merely need to be aware of
    that these character numbers may be used, and become altered appropriately.
    This is not an urgent change as these character numbers will not be filled
    very soon.

    And UTF-16, even though heavily invested in, is in this respect no different
    from all the ISO-Latin etc encodings. So it is easy to still introduce a new
    encoding scheme, that eventually will replace UTF-16.

    The problem is that repliers do not want a change, not that a change cannot
    be made. At the same time, the character set of Unicode is so complicated
    that a successor will have to be developed eventually. It is in dire need of
    all help it can get in order to be straightened out and simplified. All you
    are saying is that this is not going to happen within the scope of the
    Unicode consortium, but someone else, more qualified, should do it.

      Hans Aberg

    This archive was generated by hypermail 2.1.5 : Mon Jan 31 2005 - 12:26:27 CST