Re: Are Latin and Cyrillic essentially the same script?

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Nov 22 2010 - 12:55:58 CST

  • Next message: Asmus Freytag: "Re: UNICODE version of _T(x) macro"

    On 11/22/2010 4:15 AM, Michael Everson wrote:
    >> It boils down to this: just as there aren’t technical or usability reasons that make it problematic to represent IPA text using two Greek characters in an otherwise-Latin system,
    > Yes there are. Sorting multilingual text including Greek and IPA transcriptions, for one. The glyph shape for IPA beta is practically unknown in Greek. Latin capital Chi is not the same as Greek capital chi.
    >
    >> > so also there are no technical or usability reasons I’m aware of why it is problematic to represent this historic Janalif orthography using two Cyrillic characters.
    > They are the same technical and usability reasons which led to the disunification of Cyrillic Ԛ and Ԝ from Latin Q and W.

    The sorting problem I think I understand.

    Because scripts are kept together in sorting, when you have a mixed
    script list, you normally overrides just the sorting for the script to
    which the (sort-)language belongs. A mixed French-Russian list would use
    French ordering for the Latin characters, but the Russian words would
    all appear together (and be sorted according to some generic sort order
    for Cyrillic characters - except that for a bilingual list, sorting the
    Cyrillic according to Russian rules might also make sense.).

    Same for a French-Greek list. The Greek characters will be together and
    sorted either by a generic Greek (script) sort, or a specific Greek
    (language) sort.When you sort a mixed list of IPA and Greek, the beta
    and chi will now sort with the Latin characters, in whatever sort order
    applies for IPA. That means the order of all Greek words in the list
    will get messed up. It will neither be a generic Greek (script) sort,
    nor a specific Greek (language) sort, because you can't tailor the same
    characters two different ways in the same sort.

    That's the problem I understand is behind the issue with the Kurdish Q
    and W, and with the character pair proposed for disunification for Janalif.

    Perhaps, it seems, there are some technical problems that would make the
    support for such "mixed-script" orthographies not as seamless as for
    regular orthographies after all.

    In that case, a decision would boil down to whether these technical
    issues are significant enough (given the usage).

    In other words, it becomes a cost-benefit analysis. Duplication of
    characters (except where their glyphs have acquired a different
    appearance in the other context) always has a cost in added
    confusability. Users can select the wrong character accidentally,
    spoofers can do so intentionally to try to cause harm. But Unicode was
    never just a list of distinct glyphs, so duplication between Latin and
    Greek, or Latin and Cyrillic is already widespread, especially among the
    capitals.

    Unlike what Michael claims for IPA, the Janalif characters don't seem to
    have a very different appearance, so there would not be any technical or
    usability issue there. Minor glyph variations can be handled by standard
    technologies, like OpenType, as long as the overall appearance remains
    legible should language binding of a text have gotten lost.

    That seems to be true for IPA as well - because already, if you use the
    font binding for IPA, your a's and g's will not come out right, which
    means you don't even have to worry about betas and chis.

    IPA being a notation, I would not be surprised to learn that mixed lists
    with both IPA and other terms are a rare thing. But for Janalif it would
    seem that mixed Janalif/Cyrillic lists would be rather common, relative
    to the size of the corpus, even if its a dead (or currently out of use)
    orthography.

    I'd like to see this addressed a bit more in detail by those who support
    the decision to keep the borrowed characters unified.

    A./



    This archive was generated by hypermail 2.1.5 : Mon Nov 22 2010 - 12:59:17 CST