Re: Mixed up priorities

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Nov 05 1999 - 15:58:27 EST


Bob Rosenberg suggested:

>
> At 11:39 PM 10/24/1999 -0700, peter_constable@sil.org wrote:
>
> >What matters is whether we need to add an abstract character to represent
> >it. I and others have suggested that the assumed encoding of U+0063 U+0068
> >is adequate. The way to argue that a new abstract character is necessary
> >is to demonstrate that there are textual processes that people regularly
> >want to do in software for which the expected results cannot be achieved
> >(using reasonable means) without the proposed new character.
>
>
> IOW, show that the U+0063 U+0068 coding is not adequate since in Slovak it
> is possible to have the words that have c followed by h which is not the
> "letter" 'ch'. If this difference is true that a separate codepoint is
> needed to tell the difference between "ch" and "c" followed by "h".
>

I disagree with this recharacterization of Peter's statement. It
does not follow from the existence of minimal pairs of this sort
(which would likely involve a morpheme boundary if they do occur)
that a separate codepoint is needed to tell the difference. You
might be using language markup for embedded loanwords, or
morphological markup. Or you might be able to specify an easily
determinable morphological or other context that distinguishes the
two. Or you might use dictionaries. (Thus, for example, what you
need to do for text-to-speech to distinguish the "th" in "rathole"
from the "th" in "rather" in English.)

Peter's point is that it is not just a contrast that is as issue, but
a contrast where expected processing results cannot be reasonably
achieved without encoding a separate character.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT