From: Adam Twardoch (firstname.lastname@example.org)
Date: Sun Jan 21 2007 - 20:35:02 CST
> Are you saying utf16 doesn't support plane 16? This would make utf16
> only part of unicode.
No John, that's not the point. The point is that while a 32-bit encoding
space can in theory hold 0xFFFFFFFF codes, codes higher than 0x0010FFFF
are not valid Unicode codepoints. Mike uses such codes for internal
purposes: they're invalid Unicode codepoints but could still be used as
"non-codepoints". While Mike's software seems to filter out these
non-codepoints when storing actual text, it can be noted that in both
UTF-8 and UTF-32, it would be possible to actually store these
non-codepoints. However, UTF-16 (which uses surrogates), does not give
you opportunity to store them at all.
> John Knightley (Linux , utf8 user)
> Quoting Richard Wordingham <email@example.com>:
>> Mike wrote on Sunday, January 21, 2007 6:56 PM
>>> When I implemented collation, I needed to define code points for
>>> the various contractions that can occur. To avoid clashing with
>>> any private use code points, I chose to start allocating the con-
>>> tractions at 0x110000. This has worked quite nicely.
>> One problem with that solution is that it may work if you're working
>> with extensions of UTF-8 or extensions of UTF-32, but just doesn't work
>> with UTF-16. The other is that with the other two, especially
>> extending UTF-8, you are quite likely to fall foul of defensive code
>> guarding against impossible codepoints. It's a shame, for I had been
>> about to suggest it.
> This message sent through Virus Free Email
-- Adam Twardoch | Language Typography Unicode Fonts OpenType | twardoch.com | silesian.com | fontlab.net
This archive was generated by hypermail 2.1.5 : Sun Jan 21 2007 - 20:36:51 CST