Re: Order of Infrequent Combining Marks in Thai

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Tue May 22 2007 - 03:26:24 CDT

  • Next message: Christopher Fynn: "Re: [unicode] CJK variation modifier"

    Peter Constable wrote on Monday, May 21, 2007 10:13 PM

    > That would fit what you the sequence order you thought would make sense.
    > Take note, though: you were determining that on a *functional* basis,

    No, that was one of three arguments, each of which leads to the same
    conclusion.

    > And this leads to a thorny open issue: if these are canonically
    > equivalent, hence should display the same, how should the Thai
    > fixed-position-class marks and the "common" marks interact
    > typographically? There simply are no historical conventions that establish
    > an answer to this question.

    I think the permission in TUS 5.0 Section 5.13, namely, 'If the test to be
    displayed is known to employ a different typographical convention (either
    implicitly through knowledge of the language of the text or explicitly
    through rich text bindings), then an alternative position may be given to
    multiple non-spacing marks instead of that specified by the default inside
    out rule', is given too much weight. The example given is where the
    sequencing corresponds to the canonical order, but is left to right above
    rather than stacking vertically. The treatment of Hebrew hiriq and metheg
    provides a precedent for resolving conflicts - use CGJ to override the
    renedring effect of the canonical order.

    An interesting example is combining asterisk below. Now, I believe that
    when it modifies a consonant, as in the phonetic key attached
    (phon_key.png), a below vowel should go below the asterisk, despite the
    canonical order. (I can't find any examples either way.) However, if it is
    used with its Greek meaning - duplicating a previous transcription of a
    damaged manuscript for a no-longer legible character - it would naturally
    apply to the cluster of consonant and vowel below. My feeling is that the
    first use woud need <U+0359, CGJ, U+0E39 SARA UU> and the latter would just
    be <U+0E39, U+0359>. If <U+0E39, U+0359> is to mean vowel below asterisk,
    how are we to encode asterisk below vowel?

    > Btw, I'd be interested in scanned samples of publications in which the
    > kinds of scenarios you're raising are attested.

    The examples are all taken from the 'Modern English-Thai Dictionary'
    (พจนานุกรม อังกฤษ-ไทย ฉบับแก้ไขปรับปรุงใหม่) published by Thai Watthana
    Phanit in 1971 AD (2514 BE). The key is given in image
    http://homepage.ntlworld.com/richard.wordingham/thai/marks/phon_key.png.

    Image http://homepage.ntlworld.com/richard.wordingham/thai/marks/tiptop.png
    ('tire' - 'tit') shows a three character tie above - cf. the
    three character tie below for 'sch' that has been discussed here. Again,
    this tie seems to be restricted to a single combination, in this case
    <U+0E40 THAI CHARACTER SARA E, U+0E2D THAI CHARACTER O ANG, U+0E2D>. It
    shows WO WAEN, THO THONG, SO SO and CHO CHANG with macron below, and in
    particular the pronunciation of 'tissue' shows the combination of U+0331
    COMBINING MACRON BELOW and U+0E39 THAI CHARACTER SARA UU.

    Image http://homepage.ntlworld.com/richard.wordingham/thai/marks/vision.png
    shows the pronunciation of 'vision' with COMBINING ASTERISK BELOW. Note
    that unlike the pocket dictionary, this dictionary puts the stress mark
    before the stressed syllable.

    Image http://homepage.ntlworld.com/richard.wordingham/thai/marks/zoom.png
    gives another example of U+0331 and U+0E39 together.

    Image http://homepage.ntlworld.com/richard.wordingham/thai/marks/shed.png
    shows U+0331 together with the mark above U+0E47 THAI
    CHARACTER MAITAIKHU, which is of canonical combining class 0. This excited
    my interest, because using the Thai-Latin transliteration of CLDR 1.4.1 on
    <U+0E40 THAI CHARACTER SARA E, U+0E0A THAI CHARACTER CHO CHANG, U+0331,
    U+0E47, U+0E14 THAI CHARACTER DO DEK> and <U+0E40, U+0E0A, U+0E47, U+0331,
    U+0E14> and then applying its 'inverse' merges them as <U+0E40, U+0E0A,
    U+0331, U+0E47, U+0E14>.

    http://homepage.ntlworld.com/richard.wordingham/thai/marks/eaves.png shows
    that the mark using is COMBINING MACRON BELOW and not COMBINING LOW LINE.
    By contrast, Se-ed's Modern English-Thai Dictionary (Complete & Updated)
    Desk Reference Edition (1998) appears to use COMBINING LOW LINE in its
    similar notation. I say 'appears' - it appears to be implemented as
    mark-up, for the underlining crosses SARA UU.

    The pocket dictionary I referred to, Kamol's English-Thai Dictionary (2534
    BE, = 1991 AD), doesn't use any form of underline below, though it does use
    COMBINING ASTERISK BELOW. Instead, it uses italicised CHO CHAN, SO SO, CHO
    CHANG, THO THONG and WO WAEN. I'm not sure whether these should count as
    marked-up or as unencoded characters. It's a dictionary, not mathematics!

    One of the Mon-Khmer languages of N.E. Thailand uses what I think of as
    'combining blob below' - one could probably get away with using U+0359
    COMBINING ASTERISK BELOW for it. I saw it in several words in a Genesis
    translation at the Rosetta Project, but I can no longer find the example,
    and I cannot remember the name of the language. I think it is non-tonal; I
    was therefore struck by the spelling เจ้า for what appeared to be the
    translation of 'God'. Inconveniently, there are a lot of non-tonal
    Mon-Khmer languages spoken in N.E. Thailand.

    Martin Hosken has already raised the issue of U+0331 COMBINING MACRON BELOW
    and the vowels below in the context of a new orthography for one of
    Thailand's minority languages.

    Richard.



    This archive was generated by hypermail 2.1.5 : Tue May 22 2007 - 03:31:47 CDT