Re: Regulating PUA.

From: Mike (mike-list@pobox.com)
Date: Sun Jan 21 2007 - 17:07:15 CST

  • Next message: Mike: "Re: Proposing UTF-21/24"

    >> When I implemented collation, I needed to define code points for
    >> the various contractions that can occur. To avoid clashing with
    >> any private use code points, I chose to start allocating the con-
    >> tractions at 0x110000. This has worked quite nicely.
    >
    > One problem with that solution is that it may work if you're working
    > with extensions of UTF-8 or extensions of UTF-32, but just doesn't work
    > with UTF-16. The other is that with the other two, especially extending
    > UTF-8, you are quite likely to fall foul of defensive code guarding
    > against impossible codepoints. It's a shame, for I had been about to
    > suggest it.

    The values 0x110000 and higher are only used internally to keep
    track of contractions, and they never leak out into normal char-
    acter data. If they did, they'd be converted to 0xFFFD by my
    Char class anyway.

    Mike



    This archive was generated by hypermail 2.1.5 : Sun Jan 21 2007 - 17:06:59 CST