From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 27 2003 - 22:47:44 EDT
Peter responded:
> Kenneth Whistler wrote on 06/26/2003 05:36:34 PM:
>
> > Why is making use of the existing behavior of existing characters
> > a "groanable kludge", if it has the desired effect and makes
> > the required distinctions in text?
>
> Why is it a kludge to insert some cc=0 control character into the text for
> the sole purpose of preventing reordering during canonical ordering of two
> combining marks that do interact typographically and so should but
> nevertheless do not have the same combining class; and, moreover, to do so
> using a control character that was not created for that purpose?
>
> The answer seems so obvious, I wouldn't know how to begin responding.
And others apparently had the same feeling. But I contend that
the reason this seems odd is because of the way you present
it to yourself and others.
It isn't a matter of "my text is o.k. the way I entered it, but
now I have to insert some invisible control character into the
text for the sole purpose of preventing reordering -- which wasn't
something I wanted to have happen in the first place."
Instead, it is that for Biblical Hebrew, the following textual
conventions are adopted:
A sequence of patah followed by hiriq is represented by
<patah, CGJ, hiriq>
A sequence of hiriq followed by patah is represented by
<hiriq, CGJ, patah>
Then you build keyboards (or other abstractions) that obey
those textual conventions.
You stop telling the Biblical Scholars that their text is
screwed up because of Unicode and they have to "fix" it by
inserting crazy control codes they don't know about, and
chances are they will stop believing that their text is
screwed up. :-)
This isn't really any stranger than telling someone that for Twi, the
following textual convention is adopted:
An open o with an acute tone mark is represented by
<open-o, combining acute>
As long as the pieces stay firmly attached for entry, display,
and searching, everybody is happy and nobody needs to be
the wiser about what gimmicks the programmers are
using under the covers.
And why should it be any stranger that maintenance of vowel
point order in Biblical Hebrew cases with multiple points
requires judicious use of an invisible combining mark like CGJ,
when maintenance of visible directional layout distinctions
for any Hebrew requires a boatload of invisible format controls?
> If we want to insert a control character to prevent reordering under
> canonical ordering, I think it would be preferable to create a new control
> character for just that purpose:
How would that be less of a kludge? I contend that inventing
another invisible character *just* to do this is even more of
a kludge than what I have suggested, when use of an existing
character already has the desired effect.
The end effect of the impulse you are describing here would
be an attempt to create atomistic controls for each conceivable
text effect, and I think the UTC has already given up on
heading that direction. It is already bad enough trying
to keep straight all the possible interactions for the ones
already created, as demonstrated by the discoveries we just
made when trying to consider what happens if a ZWJ gets
plunked down *between* two combining marks.
> that would give a character that could be
> used elsewhere for the very same purpose without needing to worry about
> what unanticipated and undesirable effects might result by hijacking a
> control created for some completely unrelated purpose.
This was a more applicable criticism for the suggestions of RLM,
ZWJ, or WJ, since their very status as format controls instead
of as combining marks had undesirable effects on the combining
character sequences in question. I don't think the criticism applies
to CGJ, however, since that character doesn't have any
defined behavior other than what is needed here. And, as I
indicated in a separate response, I do not think using CGJ
for the purpose described in Biblical Hebrew is unrelated to
its intent. It is just that nobody had yet thought through a
scenario where it would prove useful between combining marks.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 23:28:19 EDT