Re: IPA and sorting

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Sep 23 1997 - 19:54:55 EDT


Michael Everson has suggested:
>
> In the Standard there are letters, used with the IPA like LATIN SMALL
> LETTER ALPHA which sorts with LATIN SMALL LETTER A -- but the current
> mappings to IPA also use GREEK SMALL LETTER BETA as a basic constituent of
> the IPA.
>
> This will cause havoc in sorting -- and one does sort IPA text, in
> glossaries etc. -- because two scripts are intermixed.
>
> Would there be support for adding the (few) Latin Greek-cloned characters
> to the standard? I know, unification was probably why they aren't there in
> the first place, but in this instance I think the unification will
> interefere with any kind of default ordering algorithm. The IPA _is_
> quintessentially Latin, and I would like to have a single ordering locale
> be able to handle sorting in either Latin (incl. IPA) or Greek. Currently
> we will not be able to do this.
>
> The question is precisely that characters have properties, and in this case
> the characters belong to more than one script -- which is in violation of
> the unification rules that kept ALPHA and A apart in the first place.
>
> Candidates for encoding of the top of my head are BETA, THETA, LAM(B)DA,
> and CHI. LATIN LETTERs ALPHA, OPEN E, GAMMA, IOTA, PHI, and UPSILON already
> exist.
>

The problem with this, as for many other "clone a character to make
the processing for XXX easier" proposals, is that it has a downside--
how to keep the two different character straight once they are cloned.

Yes, it will be easy enough to specify the correct character in an
IPA input method (LATIN BETA) as opposed to a Greek input method
(GREEK BETA), but the two will get mixed up, sure as shootin'--
especially since the current state of affairs has already made use
of the Greek consonants in question for the IPA characters. And the
inevitable software evolution will require processes to end up
treating the two as the same sometimes, as different other times,
in response to customer-reported bugs of one sort or another. And
end-users will be confused by the inconsistent behavior of two
overtly identical characters.

In my opinion, the desirable goal of having IPA collate correctly by
default with a default collation that also works for generic Latin
and generic Greek is outweighed by the additional confusion and
degradation of software behavior that sets in with more and more
cloned characters.

A preferable solution is to define IPA collation distinctly from
the default collation for either Latin or Greek. That would allow
it to be defined more correctly for IPA specifically. This is really
no different from the special collation overrides required to get
correct collation for French, Swedish, Japanese, or whatever.
The default collation rules are just that: default. They don't
have to be perfect for everything, and in fact cannot be.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT