Martin, thank you for your thoughtful reply.
> I have recently come up with a hypothesis that could explain some
> of the basic working in Thai that favors the "glyph-based" encoding
> now in Unicode/ISO10646: The fact that Thai is an isolating language,
> not having declinations/conjugations, could mean that as long as the
> syllable has a well-defined encoding,
A later message from you filled in this sentence, although I am not
positive what you are saying: possibly you are pointing out
that the number of glyph combinations to make syllables is not that
great so everything could be handled by a lookup chart without concern for
phonetic ordering of Thai.
> > I wish I could calculate the theoretical limits to settle that question.
> > All I know is the difficulty which I have experienced in creating a
> > sorting algorythm for the language. There are five levels of dependencies
> > with up to 35 members in each dependency. Of course the real language does
> > not have all combinations but the variations are enough that a simple
> > dictionary lookup does not seem practical.
> Do those things you call "levels" work similar to the following things
> in sorting Latin:
> - Base letters
> - Accents
> - Case
> I.e. you only start to consider accents if two words are completely
> equal with respect to base letters, or you only start to check out
> subjoined consonants in comparing two words if the two words are
> identical with respect to plain consonants?
Sorting in Khmer is largely based on syllables at least for the first four
levels (base consonant, first subscript, second subscript, vowel).
(1) First the base consonant by itself,
(2) Then the base consonant plus a sign (a rather rare occurrence in the
past, but with new rules about yukaleapintu and anusvara this will be
(3) Then the base consonant plus a second base consonant,
(4) Then a base consonant plus the above mentioned second base consonant
and a sign (the fifth level). Normally, however the sign is on the second
consonant (with the two syllable word carrying it coming after an
identical word without the sign). In this regard the sign level seems to
be different from the other levels (even though it affects the
pronunciation of the vowel on the first base consonant of the first
(5) This can go rippling off to the right with vowels, subscripts and
signs on the second or n-th consonants in one word
(6) Then the base consonant plus a different base consonant (cycling
through all possible second consonants),
(7) Then the base consonant plus a vowel
(8) Then the base consonant plus a first subscript
(9) Then the base consonant plus the first subscript plus a second
(10) Then the base consonant plus the first subscript plus the second
subscript plus a vowel....
Fairly recently a committee of the country's leading linguists decided
that the anusvara and yokaleapintu are only signs and not vowels. This has
greatly reduced the number of glyph combinations which make up vowels
(greatly reducing the number of vowels). This decision is not yet
reflected in the dictionaries or school textbooks.
> As one of the authors of RFC 2070, I would be very happy to offer a neat
> solution. But it's a chicken-and-egg problem. You cannot discuss encoding
> of a script and already assume an encoding. So please use inline bitmaps,
> aka GIFs. This is actually suggested in RFC 2070, at the end of section
> 2.2 :-).
I'm dreaming of a day....
> > > None of this sounds like "root" in the sense in which Tibetan uses the term.
> > Please post a URL to a document which describes what 'root' does mean when
> > refering to Tibetan.
> I'm not an expert in Tibetan, but to give you a very rough idea,
> take English words like "know", "knife", "psyche",.... Here,
> "n" or "s" would be the root, not "k" or "p". In Tibetan, consonants
> before the root can change how the root is pronounced, or maybe
> may be pronounced themselves in some dialects or in old times.
> There are grammatical rules to find out which letter is the
> root, but they are quite complex.
In Khmer the pronunciation of the consonants is not affected but the
pronunciation of the vowels is affected by the combinations of consonants
or signs in its vicinity.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT