Re: Surrogate points

From: Peter Kirk (
Date: Mon Jan 31 2005 - 05:09:47 CST

  • Next message: Hans Aberg: "Re: Surrogate points"

    On 30/01/2005 22:18, Doug Ewell wrote:

    >Hans Aberg <haberg at math dot su dot se> wrote:
    >>The numbers 0xD800-0xDFFF, 0xFFFE-0xFFFF are not associated with
    >>character, but included as place holders, never to be used, because
    >>one has failed to give the encoding UTF-16 a proper design. So an
    >>unrelated problem, choice of character encoding, is allowed to
    >>influence the logical core, the character set description.
    >In any case, it is incorrect to state that the choice of this block was
    >due to "failure to given UTF-16 a proper design." Other blocks, such as
    >the "obvious" 0xF800 through 0xFFFF, were already occupied.
    Doug, I think you have missed Hans' point, which is surely that if
    Unicode had been designed from the start as a 21-bit space or whatever,
    it is unlikely that this surrogate pair mechanism would have been used
    to encode characters beyond the first 64K, and there would not have
    been a need to reserve this large block of code points. Instead I would
    guess that a mechanism more like UTF-8 would have been introduced, in
    which perhaps every character above U+C000 would have been represented
    in an alternative to UTF-16 as a pair of characters, the first with the
    top three bits 110 and the second with the top three bits 111 - leaving
    26 bits for indicating a character. But this kind of mechanism could not
    be introduced after the fact, after a late decision to extend Unicode
    from 16 bits to 21 bits, because of the need or decision to remain
    compatible with existing UCS-16 encodings of some characters in your

    So, Hans, all of this is theoretical as Doug has made clear. Even if we
    can all agree post facto on an improved encoding, there is far too much
    investment in UTF-16 for it ever to be changed. And UTF-16, which cannot
    be deprecated, requires these code points to be reserved. But there is
    no shortage of code points, so what's the problem?

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.300 / Virus Database: 265.7.6 - Release Date: 27/01/2005

    This archive was generated by hypermail 2.1.5 : Mon Jan 31 2005 - 10:45:28 CST