Re: Mixed up priorities

From: John Cowan (cowan@locke.ccil.org)
Date: Fri Oct 22 1999 - 10:47:01 EDT


Reynolds, Gregg scripsit:

> Why is the burden of proof on the users of the language? I would turn the
> question around: is it really _necessary_ to leave slovak/czech "ch" out of
> Unicode?

A cost is imposed every time a new character is created, especially within
an existing script. Is this "ch" to be used only when composing
Slovak or Czech data? What about generic conversion from 8859-2; do
we have to mark it up to say what is Czech or Slovak (and uses the
new characters) and what is Polish (and uses plain old "c" plus "h")?
Or are we to allow for multiple spellings (in character code) of
Czech and Slovak words, increasing the burden on spelling checkers
and other tools?

We already have more than enough canonically-equivalent characters.
Every one of them represents a compromise that Unicode applications
have to work extra hard to handle. We don't need any more.

>
> I don't see what plaintext, sorting and hyphenation have to do with it.
> Slovak and Czech literates have this thing within their culture, and they
> use "ch" denote it.

Unicode is not an encoding standard for "things", but for language-neutral
abstractions, internally called "characters".

> So if plaintext doesn't accomodate "ch", then it must
> not be plain text for Slovaks and Czechs.

It does accommodate "ch" as "c" + "h", precisely as in their existing
8859-2 standard. The only time "ch" has to be treated as a unit is
in sorting and typesetting, and the international standard for sorting,
unlike that for coded characters, explicitly recognizes the necessity for
localization.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT