Re: The Atomic Theory of Unicode

From: John Cowan (cowan@locke.ccil.org)
Date: Sun Jul 11 1999 - 03:49:55 EDT

Next message: Edward Cherlin: "MIME text/plain (was Re: Plain Text)"
Previous message: Edward Cherlin: "Re: dotless j"
Maybe in reply to: Jonathan Coxhead: "The Atomic Theory of Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Jonathan Coxhead scripsit:

> This would at least allow "the story" of each character---for
> example, the decomposition <palatalized hook> + LATIN SMALL LETTER T
> for \u01AB---to appear in some form in the standard.
>
> However, I'm done. "Thank you, and good night."

Not so fast, my friend. People who are willing to work get put to
work. In this case, I believe that what you have done, though not
fitting into the structure of Unicode *decompositions*, has excellent
work to do in the realm of *backward compatibility*.

As documents start to appear directly encoded in Unicode, there will
be need for some time to come for downward-compatible conversions.
Your tables serve as an excellent start for such a thing: with some
tweaking, they allow the construction of ASCII plain text or of
ASCII-based HTML from Unicode plain text. It will be a considerable
service to the Unicode community to provide a file that maps
EQUAL BY DEFINITION into "=def", or DOUBLE-STRUCK CAPITAL R into
"R", or SUPERSCRIPT 5 into "<sup>5</sup>". The current default
method of mapping Unicode into 8-bit character sets tends to be
"replace everything that doesn't fit with '?'", a procedure which
can and must be improved on.

My recommendation: don't worry about modifying the Unicode Standard,
but go forth and create a detailed table for "approximate conversions"
and publish it. All of us will be indebted to you.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin

Next message: Edward Cherlin: "MIME text/plain (was Re: Plain Text)"
Previous message: Edward Cherlin: "Re: dotless j"
Maybe in reply to: Jonathan Coxhead: "The Atomic Theory of Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT