FW: Updated UTF-8 Magic Pocket Encoder

From: Mike Ayers (mike.ayers@tumbleweed.com)
Date: Tue Jul 06 2004 - 19:37:44 CDT

  • Next message: Anto'nio Martins-Tuva'lkin: "Re: Looking for transcription or transliteration standards latin- >arabic"

            Corrected the version information (I had it wrong in front). Thanks
    to an alert reader for pointing it out. No substantive change. Enjoy.

    Side 1 (print and cut out):

    +------------+-------+-----------------------+------+
    | U+0000 | yy zz | Cima's UTF-8 Magic | Hex= |
    | U+007F | ! ! | Pocket Encoder | B-4 |
    | YZ | . . | | |
    +------------+-------+-------+ Vers. 1.2 | 0=00 |
    | U+0080 | 3x xy | 2y zz | 06 July 2004 | 1=01 |
    | U+07FF | 3. .. | 2. ! | | 2=02 |
    | XYZ | . . | . . | M.C. | 3=03 |
    +------------+-------+-------+-------+ | 4=10 |
    | U+0800 | 32 ww | 2x xy | 2y zz | | 5=11 |
    | U+FFFF | ! ! | 2. .. | 2. ! | | 6=12 |
    | WXYZ | E . | . . | . . | | 7=13 |
    +------------+-------+-------+-------+-------+ 8=20 |
    | U-00010000 | 33 0v | 2v ww | 2x xy | 2y zz | 9=21 |
    | U-000FFFFF | ! 0. | 2. ! | 2. .. | 2. ! | A=22 |
    | VWXYZ | F . | . . | . . | . . | B=23 |
    +------------+-------+-------+-------+-------+ C=30 |
    | U-00100000 | 33 10 | 20 ww | 2x xy | 2y zz | D=31 |
    | U-0010FFFF | ! ! | ! ! | 2. .. | 2. ! | E=32 |
    | WXYZ | F 4 | 8 . | . . | . . | F=33 |
    +------------+-------+-------+-------+-------+------+

    Side 2 (print, cut out, and glue on back of side 1):

    +---------------------------------------------------+
    | Cima's UTF-8 Magic Pocket Encoder - User's Manual |
    | (vers. 1.2, 06 July 2004, by Marco Cimarosti) |
    | |
    | - Left column: min and max Unicode scalar values: |
    | pick the row that applies to the code point you |
    | want to convert to UTF-8. Letters W..Z mark the |
    | hexadecimal digits that have to be processed. |
    | - Right column: hexadecimal to base-4 table. |
    | - Central columns: work area to compute each octet|
    | (1 to 4) that constitute UTF-8 octet sequences. |
    | Convert each digit marked by W..Z from hex. to |
    | b.-4. Write b.-4 digits on the dots placed under |
    | letters w..z (two b.-4 digits per hex. digit). |
    | Convert 2-digit base-4 number to hex. digits and |
    | write them on the dots on the line. That is your |
    | UTF-8 sequence in hex.! Exclamation marks show |
    | passages that may be skipped, either because the |
    | digit is hard-coded, or because it may be copied |
    | directly from the scalar value. |
    +---------------------------------------------------+



    This archive was generated by hypermail 2.1.5 : Tue Jul 06 2004 - 19:59:15 CDT