From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Nov 22 2010 - 12:55:58 CST
On 11/22/2010 4:15 AM, Michael Everson wrote:
>> It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system,
> Yes there are. Sorting multilingual text including Greek and IPA transcriptions, for one. The glyph shape for IPA beta is practically unknown in Greek. Latin capital Chi is not the same as Greek capital chi.
>
>> > so also there are no technical or usability reasons I’m aware of why it is problematic to represent this historic Janalif orthography using two Cyrillic characters.
> They are the same technical and usability reasons which led to the disunification of Cyrillic Ԛ and Ԝ from Latin Q and W.
The sorting problem I think I understand.
Because scripts are kept together in sorting, when you have a mixed
script list, you normally overrides just the sorting for the script to
which the (sort-)language belongs. A mixed French-Russian list would use
French ordering for the Latin characters, but the Russian words would
all appear together (and be sorted according to some generic sort order
for Cyrillic characters - except that for a bilingual list, sorting the
Cyrillic according to Russian rules might also make sense.).
Same for a French-Greek list. The Greek characters will be together and
sorted either by a generic Greek (script) sort, or a specific Greek
(language) sort.When you sort a mixed list of IPA and Greek, the beta
and chi will now sort with the Latin characters, in whatever sort order
applies for IPA. That means the order of all Greek words in the list
will get messed up. It will neither be a generic Greek (script) sort,
nor a specific Greek (language) sort, because you can't tailor the same
characters two different ways in the same sort.
That's the problem I understand is behind the issue with the Kurdish Q
and W, and with the character pair proposed for disunification for Janalif.
Perhaps, it seems, there are some technical problems that would make the
support for such "mixed-script" orthographies not as seamless as for
regular orthographies after all.
In that case, a decision would boil down to whether these technical
issues are significant enough (given the usage).
In other words, it becomes a cost-benefit analysis. Duplication of
characters (except where their glyphs have acquired a different
appearance in the other context) always has a cost in added
confusability. Users can select the wrong character accidentally,
spoofers can do so intentionally to try to cause harm. But Unicode was
never just a list of distinct glyphs, so duplication between Latin and
Greek, or Latin and Cyrillic is already widespread, especially among the
capitals.
Unlike what Michael claims for IPA, the Janalif characters don't seem to
have a very different appearance, so there would not be any technical or
usability issue there. Minor glyph variations can be handled by standard
technologies, like OpenType, as long as the overall appearance remains
legible should language binding of a text have gotten lost.
That seems to be true for IPA as well - because already, if you use the
font binding for IPA, your a's and g's will not come out right, which
means you don't even have to worry about betas and chis.
IPA being a notation, I would not be surprised to learn that mixed lists
with both IPA and other terms are a rare thing. But for Janalif it would
seem that mixed Janalif/Cyrillic lists would be rather common, relative
to the size of the corpus, even if its a dead (or currently out of use)
orthography.
I'd like to see this addressed a bit more in detail by those who support
the decision to keep the borrowed characters unified.
A./
This archive was generated by hypermail 2.1.5 : Mon Nov 22 2010 - 12:59:17 CST