Re: problems in Public Review 33 UTF Conversion Code Update

From: Philippe Verdy (
Date: Wed May 19 2004 - 18:20:57 CDT

  • Next message: Philippe Verdy: "Re: ISO 15924 draft fixes"

    From: Frank Yung-Fong Tang wrote:

    > It should be:
    > Legal UTF-8 sequences are:
    > 1st---- 2nd---- 3rd---- 4th---- Codepoints---
    > 00-7F 0000- 007F
    > C2-DF 80-BF 0080- 07FF
    > E0 A0-BF 80-BF 0800- 0FFF
    > E1-EC 80-BF 80-BF 1000- CFFF
    > ED 80-9F 80-BF D000- D7FF
    > EE-EF 80-BF 80-BF E000- FFFF
    > F0 90-BF 80-BF 80-BF 10000- 3FFFF
    > F1-F3 80-BF 80-BF 80-BF 40000- FFFFF
    > F4 80-8F 80-BF 80-BF 100000-10FFFF

    However I feel it's not legal (or really not recommanded) to encode non-character codepoints xFFFE-xFFFF where x is any plane number. So the rules need to be a bit more detailed to exclude them.

    Are these permanently assigned non-characters encodable in any UTF or in CESU-8?

    This archive was generated by hypermail 2.1.5 : Wed May 19 2004 - 18:21:45 CDT