From: Philippe Verdy (firstname.lastname@example.org)
Date: Wed Jun 25 2003 - 13:51:26 EDT
> From: "Michael (michka) Kaplan" <email@example.com>
> > From: "Michael (michka) Kaplan" <firstname.lastname@example.org>
> > > From: "Andrew C. West" <email@example.com>
> > > > What I'm suggesting is that although "cui" <0F45, 0F74, 0F72>
> > > > and "ciu" <0F45, 0F72, 0F74> should be rendered identically,
> > > > the logical ordering of the codepoints representing the vowels
> > > > may represent lexical differences that would
> > > > be lost during the process of normalisation.
> > >
> > > Do you (or does anyone) have an actual example where this is the
> > > case? It may well be true but until someone has a proof there is
> > > not really an indication of a specific problem for the UTC to
> > > address.
> > Let me add that this was the case recently for Hebrew (to mention on
> > example). So it is certainly not impossible.
> > But we have enough real work to do that we should do our best to
> > veer from the theoretical. :-)
Another option would be, for the encoding of contractions, to encode an invisible letter (with combining class 0) that would prevent the reordering of combining characters. To be valid with the usage of Tibetan vowels, this character should be treated as a base consonnant, and then it would explicitly form a ligature with the previous encoding cluster, to create the actual grapheme cluster.
Why not using in that case a halant (virama) character to encode these contractions (which would be implicitly obvious for a native Tibetan reader of a rendered or printed text, but explicit for a computer program such as a generic indexing engine) ?
This archive was generated by hypermail 2.1.5 : Wed Jun 25 2003 - 14:55:34 EDT