From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Fri Feb 16 2007 - 17:08:24 CST
Lokesh Joshi wrote on Friday, February 16, 2007 7:28 PM
Subject: Re: Query for Validity of Thai Sequence
> Therefore, I'm inclined to think that more relaxed checking is in order.
A lot depends on the purpose of the checking. The most valid purpose is to
ensure that each character layout has a unique representation, as canonical
combining classes don't quite do a thorough enough job of sorting out what
is entered. To that end, you need to check that marks below occur before
marks above, i.e. prevent marks below following marks above. All other
checks are luxuries and potentially dangerous if you do not know what
language you are checking, though I can sympathise with the view that
preposed and postposed vowels should not have superscipt or subscipt marks
attached to them. I can even see a specialist use for contrasting U+0E30
THAI CHARACTER SARA A and U+0E45 THAI CHARACTER LAKKHANGYAO, even though
everything seems to indicate that in the Thai tradition they are just
contextual variants are one another.
Remember that the Thai script is not just used for Thai, Pali and Sanskrit.
Just to cope with English-Thai dictionaries you need to allow the
application of U+0359 COMBINING ASTERISK BELOW and U+0331 COMBINING MACRON
BELOW (combining character class 220) to Thai consonants. These function as
consonant modifiers, so you need to consider how they will interact with
U+0E38 THAI CHARACTER SARA U and U+0E39 THAI CHARACTER SARA UU (combining
character class 103). It's a tricky question - I would say U+034F COMBINING
GRAPHEME JOINER comes into it, but apparently you have to find out what the
Thai typographical tradition is (TUS, Combining Characters, Multiple
combining Characters). At least one of the new orthographies in Thailand
uses U+0331, but little help was forthcoming here when the question was
raised before.
Richard.
This archive was generated by hypermail 2.1.5 : Fri Feb 16 2007 - 17:10:19 CST