RE: Subject: Re: 32'nd bit & UTF-8

From: Jon Hanna (jon@hackcraft.net)
Date: Tue Jan 18 2005 - 10:34:29 CST

  • Next message: Antoine Leca: "Re: Subject: Re: 32'nd bit & UTF-8"

    > 0x00...0x7F: 0xxxxxxx
    > 0x80...0x7FF: 110xxxxx 10xxxxxx
    > 0x800...0xFFFF: 1110xxxx 10xxxxxx 10xxxxxx
    > 0x10000...0x1FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
    > 0x200000...0x3FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
    > 0x4000000...0x7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx
    > 10xxxxxx 10xxxxxx
    > 0x80000000 - 0xFFFFFFFFF: 11111110 10xxxxxx 10xxxxxx 10xxxxxx
    > 10xxxxxx 10xxxxxx
    > 10xxxxxx
    > 0x1000000000 - 0x3FFFFFFFFFF: 11111111 10xxxxxx 10xxxxxx
    > 10xxxxxx 10xxxxxx
    > 10xxxxxx 10xxxxxx 10xxxxxx

    Of course this loses the fact that UTF-8 data will never contain 0xFE or 0xFF (and so UTF-16 with a BOM will never be confused with UTF-8, a fact that is important to XML parsers for one application). And all it really gives you is an inefficient way of encoding binary data as binary data (we can already Base64 then use UTF-8 on that if we really have to).

    Regards,
    Jon Hanna
    Work: <http://www.selkieweb.com/>
    Play: <http://www.hackcraft.net/>
    Chat: <irc://irc.freenode.net/selkie>



    This archive was generated by hypermail 2.1.5 : Tue Jan 18 2005 - 10:38:03 CST