From: Richard Wordingham (email@example.com)
Date: Sun Jan 21 2007 - 15:39:19 CST
Mike wrote on Sunday, January 21, 2007 6:56 PM
> When I implemented collation, I needed to define code points for
> the various contractions that can occur. To avoid clashing with
> any private use code points, I chose to start allocating the con-
> tractions at 0x110000. This has worked quite nicely.
One problem with that solution is that it may work if you're working with
extensions of UTF-8 or extensions of UTF-32, but just doesn't work with
UTF-16. The other is that with the other two, especially extending UTF-8,
you are quite likely to fall foul of defensive code guarding against
impossible codepoints. It's a shame, for I had been about to suggest it.
There actually already is a division of the PUA in the BMP - the low end is
for end users and the high end is for system vendors and software
developers. What is lacking is a definition of when the boundary lies.
This principle seems to be generally followed. Of course, there is nothing
to stop end-users clashing - they will depend on fonts to keep the character
The big problem is 'agreements' which are more offers one cannot refuse.
There is probably no way round this.
This archive was generated by hypermail 2.1.5 : Sun Jan 21 2007 - 15:41:47 CST