From: Richard Wordingham (email@example.com)
Date: Tue May 22 2007 - 03:26:24 CDT
Peter Constable wrote on Monday, May 21, 2007 10:13 PM
> That would fit what you the sequence order you thought would make sense.
> Take note, though: you were determining that on a *functional* basis,
No, that was one of three arguments, each of which leads to the same
> And this leads to a thorny open issue: if these are canonically
> equivalent, hence should display the same, how should the Thai
> fixed-position-class marks and the "common" marks interact
> typographically? There simply are no historical conventions that establish
> an answer to this question.
I think the permission in TUS 5.0 Section 5.13, namely, 'If the test to be
displayed is known to employ a different typographical convention (either
implicitly through knowledge of the language of the text or explicitly
through rich text bindings), then an alternative position may be given to
multiple non-spacing marks instead of that specified by the default inside
out rule', is given too much weight. The example given is where the
sequencing corresponds to the canonical order, but is left to right above
rather than stacking vertically. The treatment of Hebrew hiriq and metheg
provides a precedent for resolving conflicts - use CGJ to override the
renedring effect of the canonical order.
An interesting example is combining asterisk below. Now, I believe that
when it modifies a consonant, as in the phonetic key attached
(phon_key.png), a below vowel should go below the asterisk, despite the
canonical order. (I can't find any examples either way.) However, if it is
used with its Greek meaning - duplicating a previous transcription of a
damaged manuscript for a no-longer legible character - it would naturally
apply to the cluster of consonant and vowel below. My feeling is that the
first use woud need <U+0359, CGJ, U+0E39 SARA UU> and the latter would just
be <U+0E39, U+0359>. If <U+0E39, U+0359> is to mean vowel below asterisk,
how are we to encode asterisk below vowel?
> Btw, I'd be interested in scanned samples of publications in which the
> kinds of scenarios you're raising are attested.
The examples are all taken from the 'Modern English-Thai Dictionary'
(พจนานุกรม อังกฤษ-ไทย ฉบับแก้ไขปรับปรุงใหม่) published by Thai Watthana
Phanit in 1971 AD (2514 BE). The key is given in image
('tire' - 'tit') shows a three character tie above - cf. the
three character tie below for 'sch' that has been discussed here. Again,
this tie seems to be restricted to a single combination, in this case
<U+0E40 THAI CHARACTER SARA E, U+0E2D THAI CHARACTER O ANG, U+0E2D>. It
shows WO WAEN, THO THONG, SO SO and CHO CHANG with macron below, and in
particular the pronunciation of 'tissue' shows the combination of U+0331
COMBINING MACRON BELOW and U+0E39 THAI CHARACTER SARA UU.
shows the pronunciation of 'vision' with COMBINING ASTERISK BELOW. Note
that unlike the pocket dictionary, this dictionary puts the stress mark
before the stressed syllable.
gives another example of U+0331 and U+0E39 together.
shows U+0331 together with the mark above U+0E47 THAI
CHARACTER MAITAIKHU, which is of canonical combining class 0. This excited
my interest, because using the Thai-Latin transliteration of CLDR 1.4.1 on
<U+0E40 THAI CHARACTER SARA E, U+0E0A THAI CHARACTER CHO CHANG, U+0331,
U+0E47, U+0E14 THAI CHARACTER DO DEK> and <U+0E40, U+0E0A, U+0E47, U+0331,
U+0E14> and then applying its 'inverse' merges them as <U+0E40, U+0E0A,
U+0331, U+0E47, U+0E14>.
that the mark using is COMBINING MACRON BELOW and not COMBINING LOW LINE.
By contrast, Se-ed's Modern English-Thai Dictionary (Complete & Updated)
Desk Reference Edition (1998) appears to use COMBINING LOW LINE in its
similar notation. I say 'appears' - it appears to be implemented as
mark-up, for the underlining crosses SARA UU.
The pocket dictionary I referred to, Kamol's English-Thai Dictionary (2534
BE, = 1991 AD), doesn't use any form of underline below, though it does use
COMBINING ASTERISK BELOW. Instead, it uses italicised CHO CHAN, SO SO, CHO
CHANG, THO THONG and WO WAEN. I'm not sure whether these should count as
marked-up or as unencoded characters. It's a dictionary, not mathematics!
One of the Mon-Khmer languages of N.E. Thailand uses what I think of as
'combining blob below' - one could probably get away with using U+0359
COMBINING ASTERISK BELOW for it. I saw it in several words in a Genesis
translation at the Rosetta Project, but I can no longer find the example,
and I cannot remember the name of the language. I think it is non-tonal; I
was therefore struck by the spelling เจ้า for what appeared to be the
translation of 'God'. Inconveniently, there are a lot of non-tonal
Mon-Khmer languages spoken in N.E. Thailand.
Martin Hosken has already raised the issue of U+0331 COMBINING MACRON BELOW
and the vowels below in the context of a new orthography for one of
Thailand's minority languages.
This archive was generated by hypermail 2.1.5 : Tue May 22 2007 - 03:31:47 CDT