Re: Query for Validity of Thai Sequence

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Thu Feb 15 2007 - 17:31:33 CST

  • Next message: Shohji Itoh: "JIS4 IdeoGraphs when can be used with unicode ?"

    Philippe Verdy wrote on Thursday, February 15, 2007 10:28 PM

    > Regarding the question of the validity of Thai sequences, the following
    > specification of the Thai support in OpenType (here the HTML version
    > available on Microsoft Typography website) is worth noting:

    > http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx

    I don't know if those rules have been re-instated, but a tighter set was in
    use for Windows XP. They were so tight that you couldn't type Pali and
    Sanskrit, which was a nuisance for etymologies in on-line Thai dictionaries.
    (The Thai Royal Institute's dictionary on-line resorted to using U+0E36 THAI
    CHARACTER SARA UE for <U+0E34 THAI CHARACTER SARA I U+0E4D THAI CHARACTER
    NIKHAHIT>, which actually isn't quite as bad as it seems. Outside
    computing, it is legitimate to regard the former as being composed of the
    latter pair.)

    I'm pretty sure the corresponding Lao set at
    http://www.microsoft.com/typography/OpenType%20Dev/lao/shaping.mspx has been
    relaxed. They prohibit having multiple tonemarks on the same consonant, but
    Tai Dam uses the tone mark U+00ECB MAI CATAWA on mid consonants to shift
    from one set of tones to another. There's a list of problem sequences at
    http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=ThaiLaoSeq
    . Peter Constable of Microsoft is well aware of these problems and was
    endeavouring to ensure there would be no such problems in Windows Vista.

    I don't know whether Microsoft have dealt with the problem the old rules
    imposed for Pali and Sanskrit in Lao. Of course, Pali and Sanskrit need the
    missing consonants to be restored for Lao. I don't know how standard the
    improper use of the unassigned code points in the Lao block is - I have had
    some surprises looking at the Lao fonts that provide the unencoded
    consonants. I had expected the encoding to be basically Thai + 0x80, though
    that can't work for Indic NYA and YA.

    Richard.



    This archive was generated by hypermail 2.1.5 : Thu Feb 15 2007 - 17:34:01 CST