From: Kent Karlsson (firstname.lastname@example.org)
Date: Tue Sep 06 2005 - 05:55:16 CDT
> 1) TUS4 says they consist of one or more dead consonants
> followed by a live
> consonant. That implies Sanskrit _u:rk_ ऊर्क् U+090A,
> U+0930, U+094D,
> U+0915, U+094D (stem _u:rj_ ऊर्ज् is listed in Monier-Williams at
> http://www.ibiblio.org/sripedia/ebooks/mw/0200/mw__0254.html ) is not
> written with conjunct consonants!
Maybe that definition needs to be amended. I did not use the
term 'consonant conjunct'.
> Our difference here largely results from fundamentally different
> is to insert a conjoining code between the codepoints for the
> consonant of this cluster.
Whatever one thinks of the 'virama model', that is the model
standardised. I don't see any way of changing that model now.
It is much much too late for that.
> Now in two scripts where the virama is a marginal element of
> the script,
> Khmer and Tibetan, we mark this conjoining using a special
> codepoint (coeng
> in khmer) or by modifying the codepoint value for the
Those are encoded differently, though the coeng model is
quite virama-ish. Thai and Lao are also encoded differently,
such that there is no reordering problem for display, but there
is one for collation instead. The latter is even ambiguous. This
has been solved by doing a simplified, rather than semantically
correct, reordering to logical order (now via collation clusters).
> > You really need a character based criterion, which is font
> Therefore you encode the form that is desired in an ideal
> world, and ignore
> the effects of the font. The visible viramas are the ones
> that are visible
> in the desired form - as simple as that!
Hmm. Would this "desired ideal" be language independent
(though still script dependent)? I'd really like to avoid language
dependence; very little in the UCD currently is language dependent.
Only a small part of SpecialCasing.txt exhibits language dependence,
and that is for case mapping, not for regular display.
(My target is to find suitable properties for this, that could be
included into the UCD some day.)
> >> > These must be *reliably* be distinguished in the underlying text.
> >> > It must NOT be font dependent (for properly constructed fonts).
> >> This would be unreasonable if you are referring to (2) v.
> >> (3). You would be
> >> requiring that for each *language* all Devanagari fonts
> have the same
> >> language-dependent repertoire of conjuncts.
> > Eh, no. I don't think I have said anything requiring that.
> See above.
> If by 'underlying text' you mean stored encoding,
> the statement seems
> vacuous unless you mean it should dictate whether form (3) is
> used or not.
No. Form (1) is "dictated" by the use of ZWNJ. Form (3) is requested
by the use of ZWJ. However, if form (3) is not available in the font
(not commonly used enough, in the eyes of the font maker, or indeed
never existed as a conjunct form), form (1) is used as fallback.
> > I'm not happy to leave this to be entirely platform/font dependent.
> Uniscribe interprets the code sequences as I would expect them to be
> interpreted. I see no font dependency in these sequences.
That is what one "platform", in one particular version, does. (Not any of
the versions I've got...) Not sure it is THE one behaviour to be standardised.
And Peter mentioned font dependence (for a future version), that I think
is inappropriate for this.
This archive was generated by hypermail 2.1.5 : Tue Sep 06 2005 - 06:08:11 CDT