Re: Are Latin and Cyrillic essentially the same script?

From: Asmus Freytag (
Date: Fri Nov 19 2010 - 02:22:04 CST

  • Next message: Peter Constable: "RE: Are Latin and Cyrillic essentially the same script?"

    On 11/18/2010 11:15 PM, Peter Constable wrote:
    > If you'd like a precedent, here's one:

    Yes, I think discussion of precedents is important - it leads to the
    formulation of encoding principles that can then (hopefully) result in
    more consistency in future encoding efforts.

    Let me add the caveat that I fully understand that character encoding
    doesn't work by applying cook-book style recipes, and that principles
    are better phrased as criteria for weighing a decision rather than as
    formulaic rules.

    With these caveats, then:
    > IPA is a widely-used system of transcription based primarily on the Latin script. In comparison to the Janalif orthography in question, there is far more existing data. Also, whereas that Janalif orthography is no longer in active use--hence there are not new texts to be represented (there are at best only new citations of existing texts), IPA is as a writing system in active use with new texts being created daily; thus, the body of digitized data for IPA is growing much more that is data in the Janalif orthography. And while IPA is primarily based on Latin script, not all of its characters are Latin characters: bilabial and interdental fricative phonemes are represented using Greek letters beta and theta.

    IPA has other characteristics in both its usage and its encoding that
    you need to consider to make the comparison valid.

    First, IPA requires specialized fonts because it relies on glyphic
    distinctions that fonts not designed for IPA use will not guarantee.
    (Latin a with and without hook, g with hook vs. two stories are just two
    examples). It's also a notational system that requires specific training
    in its use, and it is caseless - in distinction to ordinary Latin script.

    While several orthographies have been based on IPA, my understanding is
    that some of them saw the encoding of additional characters to make them
    work as orthographies.

    Finally, IPA, like other phonetic notations, uses distinctions between
    letter forms on the character level that would almost always be
    relegated to styling in ordinary text.

    Because of these special aspects of IPA, I would class it in its own
    category of writing systems which makes it less useful as a precedent
    against which to evaluate general Latin-based orthographies.

    > Given a precedent of a widely-used Latin writing system for which it is considered adequate to have characters of central importance represented using letters from a different script, Greek, it would seem reasonable if someone made the case that it's adequate to represent an historic Latin orthography using Cyrillic soft sign.

    I think the question can and should be asked, what is adequate for a
    historic orthography. (I don't know anything about the particulars of
    Janalif, beyond what I read here, so for now, I accept your
    categorization of it as if it were fact).

    The precedent for historic orthographies is a bit uneven in Unicode.
    Some scripts have extensive collection of characters (even duplicates or
    near duplicates) to cover historic usage. Other historic orthographies
    cannot be fully represented without markup. And some are now better
    supported than at the beginning because the encoding has plugged certain

    A helpful precedent in this case would be that of another minority or
    historic orthography, or historic minority orthography for which the use
    of Greek or Cyrillic characters with Latin was deemed acceptable. I
    don't think Janalif is totally unique (although the others may not be
    dead). I'm thinking of the Latin OU that was encoded based on a Greek
    ligature, and the perennial question of the Kurdish Q an W (Latin
    borrowings into Cyrillic - I believe these are now 051A and 051C).
    Again, these may be for living orthographies.

        /Against this backdrop, it would help if WG2 (and UTC) could point
        to agreed upon criteria that spell out what circumstances should
        favor, and what circumstances should disfavor, formal encoding of
        borrowed characters, in the LGC script family or in the general case./

    That's the main point I'm trying to make here. I think it is not enough
    to somehow arrive at a decision for one orthography, but it is necessary
    for the encoding committees to grab hold of the reasoning behind that
    decision and work out how to apply consistent reasoning like that in
    future cases.

    This may still feel a little bit unsatisfactory for those whose proposal
    is thus becoming the test-case to settle a body of encoding principles,
    but to that I say, there's been ample precedent for doing it that way in
    Unicode and 10646.

    So let me ask these questions:

        A. What are the encoding principles that follow from the disposition
        of the Janalif proposal?

        B. What precedents are these based on resp. what precedents are
        consciously established by this decision?


    This archive was generated by hypermail 2.1.5 : Fri Nov 19 2010 - 02:24:57 CST