RE: UTF to unicode conversion

From: Mike Ayers (mike.ayers@tumbleweed.com)
Date: Tue Jun 29 2004 - 13:50:11 CDT

Next message: Kenneth Whistler: "Re: what combining diacritical mark suits d and l with stroke ?"

Previous message: Chris Jacobs: "Re: UTF to unicode conversion"
Maybe in reply to: johncy inbaraj: "UTF to unicode conversion"
Next in thread: Marco Cimarosti: "RE: UTF to unicode conversion"
Maybe reply: Philippe VERDY: "Re: RE: UTF to unicode conversion"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of johncy inbaraj
Sent: Tuesday, June 29, 2004 6:07 AM

I need a conversion logic which converts a UTF character to
unicode character. If any, pls tell me.

Here's my favorite, gleaned from the archives, courtesy of Marco
Cimarosti. It doesn't have instructions for working backwards, but once you
figure out how to work forwards, reversing the operation is pretty
straightforward. Make sure to use a nonproportional font so everything
lines up.

<SNIP>

Who said that Unicode is high-tech?
Here is a device to generate UTF-8 that employs traditional tools such as
ASCII art, paper, scissors, glue, brain.

Side 1 (print and cut out):

+------------+-------+-----------------------+------+
| U+0000 | yy zz | Cima's UTF-8 Magic | Hex= |
| U+007F | ! ! | Pocket Encoder | B-4 |
| YZ | . . | | |
+------------+-------+-------+ Vers. 1.0 | 0=00 |
| U+0080 | 3x xy | 2y zz | 16 March 2000 | 1=01 |
| U+07FF | 3. .. | 2. ! | | 2=02 |
| XYZ | . . | . . | M.C. | 3=03 |
+------------+-------+-------+-------+ | 4=10 |
| U+0800 | 32 ww | 2x xy | 2y zz | | 5=11 |
| U+FFFF | ! ! | 2. .. | 2. ! | | 6=12 |
| WXYZ | E . | . . | . . | | 7=13 |
+------------+-------+-------+-------+-------+ 8=20 |
| U-00010000 | 33 0v | 2v ww | 2x xy | 2y zz | 9=21 |
| U-000FFFFF | ! 0. | 2. ! | 2. .. | 2. ! | A=22 |
| VWXYZ | F . | . . | . . | . . | B=23 |
+------------+-------+-------+-------+-------+ C=30 |
| U-00100000 | 33 1v | 2v ww | 2x xy | 2y zz | D=31 |
| U-0010FFFF | ! 1. | 2. ! | 2. .. | 2. ! | E=32 |
| VWXYZ | F . | . . | . . | . . | F=33 |
+------------+-------+-------+-------+-------+------+

Side 2 (print, cut out, and glue on back of side 1):

+---------------------------------------------------+
| Cima's UTF-8 Magic Pocket Encoder - User's Manual |
| (vers. 1.0, 16 March 2000, by Marco Cimarosti) |
| |
| - Left column: min and max Unicode scalar values: |
| pick the row that applies to the code point you |
| want to convert to UTF-8. Letters V..Z mark the |
| hexadecimal digits that have to be processed. |
| - Right column: hexadecimal to base-4 table. |
| - Central columns: work area to compute each octet|
| (1 to 4) that constitute UTF-8 octet sequences. |
| Convert each digit marked by V..Z from hex. to |
| b.-4. Write b.-4 digits on the dots placed under |
| letters v..z (two b.-4 digits per hex. digit). |
| Convert 2-digit base-4 number to hex. digits and |
| write them on the dots on the line. That is your |
| UTF-8 sequence in hex.! Exclamation marks show |
| passages that may be skipped, either because the |
| digit is hard-coded, or because it may be copied |
| directly from the scalar value. |
+---------------------------------------------------+

Enjoy!

Marco

</SNIP>

Next message: Kenneth Whistler: "Re: what combining diacritical mark suits d and l with stroke ?"
Previous message: Chris Jacobs: "Re: UTF to unicode conversion"
Maybe in reply to: johncy inbaraj: "UTF to unicode conversion"
Next in thread: Marco Cimarosti: "RE: UTF to unicode conversion"
Maybe reply: Philippe VERDY: "Re: RE: UTF to unicode conversion"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jun 29 2004 - 13:51:38 CDT