From: Christoph Päper (christoph.paeper@crissov.de)
Date: Sun Feb 20 2011 - 13:48:56 CST
Thomas Cropley:
> <UTF-c.htm>
It’s a fair idea to be backwards compatible with (most of) ISO 8859-1 by encoding U+00C0–00FF as C0h (11000000b) through FFh (11111111b) – I will not consider codepage switching with quasi-BOMs at all, because it seems like a bad idea, U+00A0–00BF are missing anyhow – and reusing the bytes 80h (10000000b) through BFh (10111111), not 9Fh , for encoding higher codepoints. I don’t think it’s a good idea to also use 11......b in multibyte code sequences, though.
UTF-8: ASCII and 3–5bit/2bit prefixes
 0....... isolation prefix,
 110..... initial prefix,
 1110.... initial prefix,
 11110... initial prefix,
 11111... illegal prefix;
 10...... medial and final prefix.
  7  0xxxxxxx
 11  110yyyxx 10xxxxxx
 16  1110yyyy 10yyyyxx 10xxxxxx
 21  11110zzz 10zzyyyy 10yyyyxx 10xxxxxx
UTF-c: ASCII and 2bit prefixes
 0....... isolation prefix,
 10...... initial and final prefix,
 11...... medial and isolation prefix.
  7  0xxxxxxx
  6  11xxxxxx
 12  10yyyyxx 10xxxxxx
 18  10zzyyyy 11yyyyxx 10xxxxxx
 21  10°°°zzz 11zzyyyy 11yyyyxx 10xxxxxx
Type 1: ASCII and 4bit prefix
 0....... isolation prefix,
 11...... isolation prefix,
 10##.... initial prefixes with following-bytes count,
 1000.... medial and final prefix.
  7  0xxxxxxx
  6  11xxxxxx
  8  1001xxxx 1000xxxx
 12  1010yyyy 1000xxxx 1000xxxx
 16  1011yyyy 1000yyyy 1000xxxx 1000xxxx
 
=> incomplete coverage.
Type 2: ASCII and 5bit/3bit prefix
 0....... isolation prefix,
 11...... isolation prefix,
 101##... initial prefixes with following-bytes count (+1),
 100..... medial and final prefix.
  7  0xxxxxxx
  6  11xxxxxx
  8  10100xxx 100xxxxx
 13  10101yyy 100yyxxx 100xxxxx
 18  10110zzy 100yyyyy 100yyxxx 100xxxxx
 21  10111°°° 100°zzzy 100yyyyy 100yyxxx 100xxxxx
Type 3.1: ASCII and 3bit prefix
 0....... isolation prefix,
 11...... isolation prefix,
 101..... initial and medial byte prefix,
 100..... final byte prefix.
  7  0xxxxxxx
  6  11xxxxxx
 10  101yyxxx 100xxxxx
 15  101yyyyy 101yyxxx 100xxxxx
 20  101zzzzy 101yyyyy 101yyxxx 100xxxxx
 21  101°°°°z 101zzzzy 101yyyyy 101yyxxx 100xxxxx
Type 3.2: ASCII and 3bit prefix
 0....... isolation prefix,
 11...... isolation prefix,
 101..... initial and final prefix,
 100..... medial prefix.
  7  0xxxxxxx
  6  11xxxxxx
 10  101yyxxx 101xxxxx
 15  101yyyyy 100yyxxx 101xxxxx
 20  101zzzzy 100yyyyy 100yyxxx 101xxxxx
 21  101°°°°z 100zzzzy 100yyyyy 100yyxxx 101xxxxx
Type 3.3: ASCII and 3bit prefix
 0....... isolation prefix,
 11...... isolation prefix,
 101..... initial prefix,
 100..... medial and final prefix.
  7  0xxxxxxx
  6  11xxxxxx
 10  101yyxxx 100xxxxx
 15  101yyyyy 100yyxxx 100xxxxx
 20  101zzzzy 100yyyyy 100yyxxx 100xxxxx
 21  101°°°°z 100zzzzy 100yyyyy 100yyxxx 100xxxxx
Type 4: Latin1 and 4bit prefix
 0....... isolation prefix,
 101..... isolation prefix,
 11...... isolation prefix,
 1001.... initial prefix,
 1000.... medial and final prefix.
  7  0xxxxxxx
  6  11xxxxxx
  5  101xxxxx
  8  1001xxxx 1000xxxx
 12  1001yyyy 1000xxxx 1000xxxx
 16  1001yyyy 1000yyyy 1000xxxx 1000xxxx
 20  1001zzzz 1000yyyy 1000yyyy 1000xxxx 1000xxxx
 21  1001°°°z 1000zzzz 1000yyyy 1000yyyy 1000xxxx 1000xxxx
Type 5: Latin1 and 6bit/4bit prefix
 0....... isolation prefix,
 101..... isolation prefix,
 11...... isolation prefix,
 1001##.. initial prefix with following-bytes count (+1),
 1000.... medial and final prefix.
  7  0xxxxxxx
  6  11xxxxxx
  5  101xxxxx
  6  100100xx 1000xxxx
 10  100101yy 1000xxxx 1000xxxx
 14  100110yy 1000yyyy 1000xxxx 1000xxxx
 18  100111zz 1000yyyy 1000yyyy 1000xxxx 1000xxxx
=> incomplete coverage.
This archive was generated by hypermail 2.1.5 : Sun Feb 20 2011 - 13:51:25 CST