Re: The Atomic Theory of Unicode

From: John Cowan (cowan@locke.ccil.org)
Date: Sun Jul 11 1999 - 03:49:55 EDT


Jonathan Coxhead scripsit:

> This would at least allow "the story" of each character---for
> example, the decomposition <palatalized hook> + LATIN SMALL LETTER T
> for \u01AB---to appear in some form in the standard.
>
> However, I'm done. "Thank you, and good night."

Not so fast, my friend. People who are willing to work get put to
work. In this case, I believe that what you have done, though not
fitting into the structure of Unicode *decompositions*, has excellent
work to do in the realm of *backward compatibility*.

As documents start to appear directly encoded in Unicode, there will
be need for some time to come for downward-compatible conversions.
Your tables serve as an excellent start for such a thing: with some
tweaking, they allow the construction of ASCII plain text or of
ASCII-based HTML from Unicode plain text. It will be a considerable
service to the Unicode community to provide a file that maps
EQUAL BY DEFINITION into "=def", or DOUBLE-STRUCK CAPITAL R into
"R", or SUPERSCRIPT 5 into "<sup>5</sup>". The current default
method of mapping Unicode into 8-bit character sets tends to be
"replace everything that doesn't fit with '?'", a procedure which
can and must be improved on.

My recommendation: don't worry about modifying the Unicode Standard,
but go forth and create a detailed table for "approximate conversions"
and publish it. All of us will be indebted to you.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT