32'nd bit & UTF-8

From: Hans Aberg (haberg@math.su.se)
Date: Mon Jan 17 2005 - 12:06:39 CST

  • Next message: Jon Hanna: "RE: 32'nd bit & UTF-8"

    Are there any good reasons for UTF-32 to exclude the 32'nd bit of an encoded
    4-byte? I.e, the 6-byte combinations
        111111xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
    where the first x = 1.

    With a full 32-bit encoding, one can also use UTF-8 to encoding binary data.
    It also simplifies somewhat the implementation of Unicode in lexer
    generators (such as Flex): The leading byte then covers all 256
    combinations. All 2^32 numbers should probably be there for generating
    proper lexer error messages.

      Hans Aberg

    This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 12:16:41 CST