Re: kurdish sorani

From: John Hudson (john@tiro.ca)
Date: Fri Sep 01 2006 - 23:43:24 CDT

Next message: Eric Muller: "UDHR in Unicode project"

Previous message: Andries Brouwer: "Re: kurdish sorani"
In reply to: Andries Brouwer: "Re: kurdish sorani"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andries Brouwer wrote:

> If someone claims that the Latin alphabet has upper and lower case
> then you could use the same reasoning to deny this and say that
> such a distinction is invisible in a single-case-only font.
> One has to find a way to distinguish "intrinsic" character properties
> from "accidental" glyph properties, and the only way to learn about
> intrinsic character properties is to examine the printed texts
> in a large representative collection.

To wrap up the argument from my side:

I would not deny that the Latin alphabet has upper and lower case based on the lack of a
visual distintion in a unicase or uncial font, and I don't think that is really parallel
to my Urdu/Uighur logic. What I would say is that in the uncial style of the Latin script
there is no visible distinction between the uppercase characters and corresponding
lowercase characters; indeed, that this is one of the defining characteristics of that
script style. In practice, this limits the suitability of the uncial style for all sorts
of modern uses, and it also limits the suitability of e.g. Irish uncial style types to
setting other languages; they are stylistically inappropriate for typesetting e.g.
Turkish. But the uncial forms are not *incorrect* forms of the Latin script, whether they
are used to represent uppercase characters or lowercase characters or both. Rather, they
are style-specific forms and one must beware that this style is not appropriate for
writing all Latin script languages.

I think the variant shaping of U+06BE is parallel to this. In some writing styles it takes
two forms, in some writing styles it takes one form. In practice this means that some
styles are preferred over others for writing particular languages, or are at least more
common. In actual shaping terms, i.e. in terms of the representation of the character by
shaping engines and fonts, U+06BE always takes *four* forms, and this, not the particular
shape of those forms, is the 'intrinsic character property'. The distinction between
isolated, initial, medial and final forms that all look roughly the same in the nasta'liq
style, on the one hand, and isolated and initial vs medial and final forms that look
different in the naskh style, on the other hand, is itself a matter of 'accidental glyph
properties'.

If one is going to examine a corpus of printed (and manuscript) texts to determine
intrinsic characters properties, then one also has to conduct that examination in context
of different writing styles, palaeographically, in order to be sure that one's analysis
works across styles, and is not erroneously determined by a single style, even if that
style is the dominant one for a given language.

John Hudson

-- 
Tiro Typeworks        www.tiro.com
Vancouver, BC         john@tiro.ca
I am not yet so lost in lexicography, as to forget
that words are the daughters of earth, and that things
are the sons of heaven.  - Samuel Johnson

Next message: Eric Muller: "UDHR in Unicode project"
Previous message: Andries Brouwer: "Re: kurdish sorani"
In reply to: Andries Brouwer: "Re: kurdish sorani"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Sep 02 2006 - 01:46:11 CDT