From: Richard Wordingham (firstname.lastname@example.org)
Date: Thu Feb 01 2007 - 22:54:57 CST
----- Original Message -----
From: +ACI-Ruszlan Gaszanov+ACI- +ADw-ruszlan+AEA-ather.net+AD4-
Sent: Sunday, January 28, 2007 6:05 PM
Subject: RE: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts
+AD4- +ACI-Ligation required+ACI- and +ACI-ligation prohibited+ACI- are orthographical concepts
+AD4- and must be encoded in plain text.
+AD4- +ACI-Optional ligation+ACI-, on the other hand, is a stylistic concept. Basically,
+AD4- it means that the specific orthography, as a rule, allows ligation of some
+AD4- character combinations in certain writing/typesetting styles. Therefore,
+AD4- +ACI-optional ligation+ACI- should be handled by higher level protocols based on
+AD4- language tagging and rich text stypes applied to the text - not encoded at
+AD4- plain text level.
+AD4- Considering the above, ZWNJ should only be encoded in plain text for
+AD4- prohibiting ligation in exceptional cases for orthographies that allow
+AD4- +ACI-optional ligation+ACI-. ZWJ, on the other hand, should be used for encoding
+AD4- orthographically significant ligatures, not stylistic ligation.
+AD4- Modern English typesetting practice is somewhat of an anomaly in this
+AD4- respect, since ligation is allowed only in certain words. However, those
+AD4- are in fact unaltered French or Latin words written in their native
+AD4- orthography and should be properly tagged as +ACI-French+ACI- or +ACI-Latin+ACI-, rather
+AD4- then +ACI-English+ACI-. So, if you apply a style that uses optional ligation for
+AD4- those languages, ligation would occur in such words, but not in the rest
+AD4- of text tagged as proper +ACI-Modern English+ACI- (since all modern English
+AD4- orthography variations do not allow ligation as a rule).
Let me see if I understand you on this. In HTML, in the middle of English
text I should write 'haemocoels' as '+ADw-span lang+AD0AIg-la-GB+ACIAPg-haemocoels+ADw-/span+AD4-'
because ligation is optional and despitethe fact that neither plural
'haemocoels' nor singular 'haemocoel' is a Latin word-form - and the word
was coined in modern times from Greek roots, but when giving the Latin
etymon of English 'aerial' I should cite '+ADw-span
lang+AD0AIg-la-GB+ACIAPg-a+ACYAIw-8204+ADs-erius+ADw-/span+AD4-' to inhibit ligation because 'ae' is not a
diphthong in this word. Is this your view?
Aside: Dare I ask which variants of Latin used in England eschew ligation?
I don't recall ligation in the Latin textbooks we used at school.
I think the proper abstract linguistically-based mark-up would be to mark
words like 'haemocoel' as Latinate - this would cover old styles in which
native and Latinate words were printed in different fonts, or, going further
back, handwritten in different styles. I'm not sure how one would do this
in a general rather then ad-hoc fashion. (One could use 'class' and a
stylesheet in HTML to select an appropriate font, but the names of the
classes would be idiosyncratic.)
Perhaps I am wrong to try and separate 'spelling' and writing style.
There's a Northern Thai school of spelling that chooses the symbol for /a:/
in part on an etymological basis (Pali v. native), but the current plan is
for the vowel form to be specified in the encoding, as different schools use
different rules for choosing the vowel form - and even then writers are not
self-consistent. Even Pali (or at least, words regularly derived from Pali)
has some interesting stylistic variation in writing which will be reflected
in the encoding but would not normally be represented in a Pali
transcription - treating the variation as rendering rules would be quite
complex. There are four different ways of writing the last two syllables of
Pali +AF8-desana:+AF8- and +AF8-sa:sana:+AF8AIQ- Three of the ways may be seen as using the
same abbreviation technique - merging the last two aksharas. The simplest
to express in transliteration is +AF8-sa:ssna:+AF8- for +AF8-sa:sana:+AF8-, and the others
may be thought of as +ACoAXw-sa:ss'na:+AF8- (+AF8-dess'na:+AF8- is attested) and +AF8-sa:s'na:+AF8-.
What I've written with an apostrophe is actually a repetition mark - one
could view it as duplicating the implicit vowel so that only one is killed
by the explicit vowel (a:). However, it's not so easy to argue that these
are just different rendering conventions, say of +ACI-s'n+ACI- as the reference
Another case would be the homorganic nasals in Indic scripts, at least in
for writing Sanskrit - anusvara or full consonant? Some might say it's a
case of sa-IN v. sa-GB/sa-DE. One could certainly argument that using a
font change to switch from one form to the other was not compliant with
This archive was generated by hypermail 2.1.5 : Thu Feb 01 2007 - 22:58:19 CST