32'nd bit & UTF-8

From: Hans Aberg (haberg@math.su.se)
Date: Mon Jan 17 2005 - 12:06:39 CST

Next message: Jon Hanna: "RE: 32'nd bit & UTF-8"

Previous message: Peter Kirk: "Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)"
Next in thread: Jon Hanna: "RE: 32'nd bit & UTF-8"
Reply: Jon Hanna: "RE: 32'nd bit & UTF-8"
Reply: Doug Ewell: "Re: 32'nd bit & UTF-8"
Maybe reply: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "RE: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Reply: Antoine Leca: "Re: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "RE: 32'nd bit & UTF-8"
Maybe reply: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe reply: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe reply: Lars Kristan: "RE: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "Re: 32'nd bit & UTF-8"
Maybe reply: Lars Kristan: "RE: 32'nd bit & UTF-8"
Maybe reply: Peter Constable: "RE: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "Re: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Are there any good reasons for UTF-32 to exclude the 32'nd bit of an encoded
4-byte? I.e, the 6-byte combinations
111111xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
where the first x = 1.

With a full 32-bit encoding, one can also use UTF-8 to encoding binary data.
It also simplifies somewhat the implementation of Unicode in lexer
generators (such as Flex): The leading byte then covers all 256
combinations. All 2^32 numbers should probably be there for generating
proper lexer error messages.

Hans Aberg

Next message: Jon Hanna: "RE: 32'nd bit & UTF-8"
Previous message: Peter Kirk: "Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)"
Next in thread: Jon Hanna: "RE: 32'nd bit & UTF-8"
Reply: Jon Hanna: "RE: 32'nd bit & UTF-8"
Reply: Doug Ewell: "Re: 32'nd bit & UTF-8"
Maybe reply: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "RE: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Reply: Antoine Leca: "Re: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "RE: 32'nd bit & UTF-8"
Maybe reply: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe reply: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "Re: 32'nd bit & UTF-8"
Maybe reply: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe reply: Lars Kristan: "RE: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "Re: 32'nd bit & UTF-8"
Maybe reply: Lars Kristan: "RE: 32'nd bit & UTF-8"
Maybe reply: Peter Constable: "RE: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "Re: 32'nd bit & UTF-8"
Maybe reply: Arcane Jill: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 12:16:41 CST