Re: AW: ASCII and Unicode lifespan

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu May 19 2005 - 11:20:40 CDT

  • Next message: Peter Constable: "RE: Stateful encoding mechanisms"

    From: "Dominikus Scherkl" <lyratelle@gmx.de>
    > Naa, I taked about the far, far future where all 17 planes become
    > too full (so, how can a new plane be opened when there is no
    > space left to define new surrogate-pairs in the bmp?)

    There are still some free columns in the BMP to add new surrogates if needed
    in some future: just between the existing Hangul syllables block and the
    existing surrogates (D7B0 to D7FF: that's 80 code units).

    Using them would require changing UTF-16, or designing a new UTF: these
    additional super-surrogates could become the leaders of a 3-surrogates
    sequence, followed by the two existing surrogates code units. It would allow
    extending the code space with 1280 additional planes (if all the 80
    positions are allocated for this usage), and so the new codespace would have
    a total of 1297 planes (codepoints up to U+511FFFF).

    The other way of extending the codepoint space would be to allocate the new
    surrogates in existing supplementary planes (probably in the SSP, i.e. in
    plane 14), and then using the existing UTF-16 rules to map them with pairs
    of surrogates in the BMP. This approach would be compatible with existing
    UTF-16 decoders, even though they will decode the whole sequence of 4
    surrogates as two codepoints in the SSP, instead of one codepoint in the
    extra planes.

    The bad thing about this approach is the total encoding length (4 UTF-16
    surrogates, i.e. 8 bytes for a single codepoint that would be at most 31-bit
    wide), but as we speak about a far, far future, will it be really a problem
    on future platforms, and won't UTF-16 be deprecated since long by UTF-32?

    At the same time, UTF-8 and UTF-32 would need to be extended and given new
    identifiers (like UXF-8, UXF-16, UXF-32) to cope with the extended code
    point space.

    So I don't think we have real limitations for now, and ample space for the
    future...



    This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 15:09:34 CDT