From: Hans Aberg (haberg@math.su.se)
Date: Mon Jan 17 2005 - 12:06:39 CST
Are there any good reasons for UTF-32 to exclude the 32'nd bit of an encoded
4-byte? I.e, the 6-byte combinations
111111xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
where the first x = 1.
With a full 32-bit encoding, one can also use UTF-8 to encoding binary data.
It also simplifies somewhat the implementation of Unicode in lexer
generators (such as Flex): The leading byte then covers all 256
combinations. All 2^32 numbers should probably be there for generating
proper lexer error messages.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 12:16:41 CST