Re: Query for Validity of Thai Sequence

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Fri Feb 16 2007 - 17:08:24 CST

  • Next message: Philippe Verdy: "RE: BOCU-1 spec"

    Lokesh Joshi wrote on Friday, February 16, 2007 7:28 PM
    Subject: Re: Query for Validity of Thai Sequence

    > Therefore, I'm inclined to think that more relaxed checking is in order.

    A lot depends on the purpose of the checking. The most valid purpose is to
    ensure that each character layout has a unique representation, as canonical
    combining classes don't quite do a thorough enough job of sorting out what
    is entered. To that end, you need to check that marks below occur before
    marks above, i.e. prevent marks below following marks above. All other
    checks are luxuries and potentially dangerous if you do not know what
    language you are checking, though I can sympathise with the view that
    preposed and postposed vowels should not have superscipt or subscipt marks
    attached to them. I can even see a specialist use for contrasting U+0E30
    THAI CHARACTER SARA A and U+0E45 THAI CHARACTER LAKKHANGYAO, even though
    everything seems to indicate that in the Thai tradition they are just
    contextual variants are one another.

    Remember that the Thai script is not just used for Thai, Pali and Sanskrit.
    Just to cope with English-Thai dictionaries you need to allow the
    application of U+0359 COMBINING ASTERISK BELOW and U+0331 COMBINING MACRON
    BELOW (combining character class 220) to Thai consonants. These function as
    consonant modifiers, so you need to consider how they will interact with
    U+0E38 THAI CHARACTER SARA U and U+0E39 THAI CHARACTER SARA UU (combining
    character class 103). It's a tricky question - I would say U+034F COMBINING
    GRAPHEME JOINER comes into it, but apparently you have to find out what the
    Thai typographical tradition is (TUS, Combining Characters, Multiple
    combining Characters). At least one of the new orthographies in Thailand
    uses U+0331, but little help was forthcoming here when the question was
    raised before.

    Richard.



    This archive was generated by hypermail 2.1.5 : Fri Feb 16 2007 - 17:10:19 CST