RE: UTF to unicode conversion

From: Mike Ayers (mike.ayers@tumbleweed.com)
Date: Tue Jun 29 2004 - 13:50:11 CDT

  • Next message: Kenneth Whistler: "Re: what combining diacritical mark suits d and l with stroke ?"

    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
    Behalf Of johncy inbaraj
    Sent: Tuesday, June 29, 2004 6:07 AM

                     I need a conversion logic which converts a UTF character to
    unicode character. If any, pls tell me.

            Here's my favorite, gleaned from the archives, courtesy of Marco
    Cimarosti. It doesn't have instructions for working backwards, but once you
    figure out how to work forwards, reversing the operation is pretty
    straightforward. Make sure to use a nonproportional font so everything
    lines up.

    <SNIP>

    Who said that Unicode is high-tech?
    Here is a device to generate UTF-8 that employs traditional tools such as
    ASCII art, paper, scissors, glue, brain.

    Side 1 (print and cut out):

    +------------+-------+-----------------------+------+
    | U+0000 | yy zz | Cima's UTF-8 Magic | Hex= |
    | U+007F | ! ! | Pocket Encoder | B-4 |
    | YZ | . . | | |
    +------------+-------+-------+ Vers. 1.0 | 0=00 |
    | U+0080 | 3x xy | 2y zz | 16 March 2000 | 1=01 |
    | U+07FF | 3. .. | 2. ! | | 2=02 |
    | XYZ | . . | . . | M.C. | 3=03 |
    +------------+-------+-------+-------+ | 4=10 |
    | U+0800 | 32 ww | 2x xy | 2y zz | | 5=11 |
    | U+FFFF | ! ! | 2. .. | 2. ! | | 6=12 |
    | WXYZ | E . | . . | . . | | 7=13 |
    +------------+-------+-------+-------+-------+ 8=20 |
    | U-00010000 | 33 0v | 2v ww | 2x xy | 2y zz | 9=21 |
    | U-000FFFFF | ! 0. | 2. ! | 2. .. | 2. ! | A=22 |
    | VWXYZ | F . | . . | . . | . . | B=23 |
    +------------+-------+-------+-------+-------+ C=30 |
    | U-00100000 | 33 1v | 2v ww | 2x xy | 2y zz | D=31 |
    | U-0010FFFF | ! 1. | 2. ! | 2. .. | 2. ! | E=32 |
    | VWXYZ | F . | . . | . . | . . | F=33 |
    +------------+-------+-------+-------+-------+------+

    Side 2 (print, cut out, and glue on back of side 1):

    +---------------------------------------------------+
    | Cima's UTF-8 Magic Pocket Encoder - User's Manual |
    | (vers. 1.0, 16 March 2000, by Marco Cimarosti) |
    | |
    | - Left column: min and max Unicode scalar values: |
    | pick the row that applies to the code point you |
    | want to convert to UTF-8. Letters V..Z mark the |
    | hexadecimal digits that have to be processed. |
    | - Right column: hexadecimal to base-4 table. |
    | - Central columns: work area to compute each octet|
    | (1 to 4) that constitute UTF-8 octet sequences. |
    | Convert each digit marked by V..Z from hex. to |
    | b.-4. Write b.-4 digits on the dots placed under |
    | letters v..z (two b.-4 digits per hex. digit). |
    | Convert 2-digit base-4 number to hex. digits and |
    | write them on the dots on the line. That is your |
    | UTF-8 sequence in hex.! Exclamation marks show |
    | passages that may be skipped, either because the |
    | digit is hard-coded, or because it may be copied |
    | directly from the scalar value. |
    +---------------------------------------------------+

    Enjoy!

    Marco

    </SNIP>



    This archive was generated by hypermail 2.1.5 : Tue Jun 29 2004 - 13:51:38 CDT