C.U.M.P.E. vers. 1.0 (was: RE: Unicode to UTF-8)

From: Marco.Cimarosti@icl.com
Date: Thu Mar 16 2000 - 12:07:21 EST


Who said that Unicode is high-tech?

Here is a device to generate UTF-8 that employs traditional tools such as
ASCII art, paper, scissors, glue, brain.

        :-)

Side 1 (print and cut out):

+------------+-------+-----------------------+------+
| U+0000 | yy zz | Cima's UTF-8 Magic | Hex= |
| U+007F | ! ! | Pocket Encoder | B-4 |
| YZ | . . | | |
+------------+-------+-------+ Vers. 1.0 | 0=00 |
| U+0080 | 3x xy | 2y zz | 16 March 2000 | 1=01 |
| U+07FF | 3. .. | 2. ! | | 2=02 |
| XYZ | . . | . . | M.C. | 3=03 |
+------------+-------+-------+-------+ | 4=10 |
| U+0800 | 32 ww | 2x xy | 2y zz | | 5=11 |
| U+FFFF | ! ! | 2. .. | 2. ! | | 6=12 |
| WXYZ | E . | . . | . . | | 7=13 |
+------------+-------+-------+-------+-------+ 8=20 |
| U-00010000 | 33 0v | 2v ww | 2x xy | 2y zz | 9=21 |
| U-000FFFFF | ! 0. | 2. ! | 2. .. | 2. ! | A=22 |
| VWXYZ | F . | . . | . . | . . | B=23 |
+------------+-------+-------+-------+-------+ C=30 |
| U-00100000 | 33 1v | 2v ww | 2x xy | 2y zz | D=31 |
| U-0010FFFF | ! 1. | 2. ! | 2. .. | 2. ! | E=32 |
| VWXYZ | F . | . . | . . | . . | F=33 |
+------------+-------+-------+-------+-------+------+

Side 2 (print, cut out, and glue on back of side 1):

+---------------------------------------------------+
| Cima's UTF-8 Magic Pocket Encoder - User's Manual |
| (vers. 1.0, 16 March 2000, by Marco Cimarosti) |
| |
| - Left column: min and max Unicode scalar values: |
| pick the row that applies to the code point you |
| want to convert to UTF-8. Letters V..Z mark the |
| hexadecimal digits that have to be processed. |
| - Right column: hexadecimal to base-4 table. |
| - Central columns: work area to compute each octet|
| (1 to 4) that constitute UTF-8 octet sequences. |
| Convert each digit marked by V..Z from hex. to |
| b.-4. Write b.-4 digits on the dots placed under |
| letters v..z (two b.-4 digits per hex. digit). |
| Convert 2-digit base-4 number to hex. digits and |
| write them on the dots on the line. That is your |
| UTF-8 sequence in hex.! Exclamation marks show |
| passages that may be skipped, either because the |
| digit is hard-coded, or because it may be copied |
| directly from the scalar value. |
+---------------------------------------------------+

Enjoy!
Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT