From: Jon Hanna (jon@hackcraft.net)
Date: Mon Jan 17 2005 - 12:47:31 CST
> Are there any good reasons for UTF-32 to exclude the 32'nd
> bit of an encoded
> 4-byte? I.e, the 6-byte combinations
> 111111xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
> where the first x = 1.
Because there is no such character. Even the 5 and 6-octet combinations
allowable by ISO 10646 won't identify a Unicode character (or an assigned
ISO 10646 character).
> With a full 32-bit encoding, one can also use UTF-8 to
> encoding binary data.
I really find it hard to see the advantage to this.
> It also simplifies somewhat the implementation of Unicode in lexer
> generators (such as Flex):
Not as much as basing the lexer on characters rather than octets does.
Regards,
Jon Hanna
Work: <http://www.selkieweb.com/>
Play: <http://www.hackcraft.net/>
Chat: <irc://irc.freenode.net/selkie>
This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 12:52:34 CST