RE: 32'nd bit & UTF-8

From: Jon Hanna (jon@hackcraft.net)
Date: Mon Jan 17 2005 - 12:47:31 CST

Next message: Doug Ewell: "Re: 32'nd bit & UTF-8"

Previous message: Hans Aberg: "32'nd bit & UTF-8"
In reply to: Hans Aberg: "32'nd bit & UTF-8"
Next in thread: Doug Ewell: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> Are there any good reasons for UTF-32 to exclude the 32'nd
> bit of an encoded
> 4-byte? I.e, the 6-byte combinations
> 111111xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
> where the first x = 1.

Because there is no such character. Even the 5 and 6-octet combinations
allowable by ISO 10646 won't identify a Unicode character (or an assigned
ISO 10646 character).

> With a full 32-bit encoding, one can also use UTF-8 to
> encoding binary data.

I really find it hard to see the advantage to this.

> It also simplifies somewhat the implementation of Unicode in lexer
> generators (such as Flex):

Not as much as basing the lexer on characters rather than octets does.

Regards,
Jon Hanna
Work: <http://www.selkieweb.com/>
Play: <http://www.hackcraft.net/>
Chat: <irc://irc.freenode.net/selkie>

Next message: Doug Ewell: "Re: 32'nd bit & UTF-8"
Previous message: Hans Aberg: "32'nd bit & UTF-8"
In reply to: Hans Aberg: "32'nd bit & UTF-8"
Next in thread: Doug Ewell: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 12:52:34 CST