Re: kurdish sorani

From: John Hudson (john@tiro.ca)
Date: Fri Sep 01 2006 - 23:43:24 CDT

  • Next message: Eric Muller: "UDHR in Unicode project"

    Andries Brouwer wrote:

    > If someone claims that the Latin alphabet has upper and lower case
    > then you could use the same reasoning to deny this and say that
    > such a distinction is invisible in a single-case-only font.
    > One has to find a way to distinguish "intrinsic" character properties
    > from "accidental" glyph properties, and the only way to learn about
    > intrinsic character properties is to examine the printed texts
    > in a large representative collection.

    To wrap up the argument from my side:

    I would not deny that the Latin alphabet has upper and lower case based on the lack of a
    visual distintion in a unicase or uncial font, and I don't think that is really parallel
    to my Urdu/Uighur logic. What I would say is that in the uncial style of the Latin script
    there is no visible distinction between the uppercase characters and corresponding
    lowercase characters; indeed, that this is one of the defining characteristics of that
    script style. In practice, this limits the suitability of the uncial style for all sorts
    of modern uses, and it also limits the suitability of e.g. Irish uncial style types to
    setting other languages; they are stylistically inappropriate for typesetting e.g.
    Turkish. But the uncial forms are not *incorrect* forms of the Latin script, whether they
    are used to represent uppercase characters or lowercase characters or both. Rather, they
    are style-specific forms and one must beware that this style is not appropriate for
    writing all Latin script languages.

    I think the variant shaping of U+06BE is parallel to this. In some writing styles it takes
    two forms, in some writing styles it takes one form. In practice this means that some
    styles are preferred over others for writing particular languages, or are at least more
    common. In actual shaping terms, i.e. in terms of the representation of the character by
    shaping engines and fonts, U+06BE always takes *four* forms, and this, not the particular
    shape of those forms, is the 'intrinsic character property'. The distinction between
    isolated, initial, medial and final forms that all look roughly the same in the nasta'liq
    style, on the one hand, and isolated and initial vs medial and final forms that look
    different in the naskh style, on the other hand, is itself a matter of 'accidental glyph
    properties'.

    If one is going to examine a corpus of printed (and manuscript) texts to determine
    intrinsic characters properties, then one also has to conduct that examination in context
    of different writing styles, palaeographically, in order to be sure that one's analysis
    works across styles, and is not erroneously determined by a single style, even if that
    style is the dominant one for a given language.

    John Hudson

    -- 
    Tiro Typeworks        www.tiro.com
    Vancouver, BC         john@tiro.ca
    I am not yet so lost in lexicography, as to forget
    that words are the daughters of earth, and that things
    are the sons of heaven.  - Samuel Johnson
    


    This archive was generated by hypermail 2.1.5 : Sat Sep 02 2006 - 01:46:11 CDT