From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jul 14 2004 - 17:10:20 CDT
Peter Kirk continued:
> I did read it, but it didn't deal with the issue I was concerned about,
> of multiple combining marks. And I was concerned about that issue
> because that was the major concern expressed in the earlier discussion
> on variation selectors, and presented as the decisive reason why
> variation selectors cannot be used with combining marks.
And I agree that that is the (or at least "a") decisive reason
why variation selectors cannot be used with combining marks.
In other words, once you try to define <CM, VAR1> as being a
variant form of the combining mark in question, you start
getting into trouble whenever you try to add another combining
mark after it in sequence.
>
> If CGJ can be used with combining marks in situations where (as far as
> we know) there is in fact no problem with multiple combining marks, what
> is to stop variation selectors being used in the same situations?
Because the situations are different.
Apparently you are not grokking this.
The umlaut/tréma case is one of distinguishing the *collation* order
of letters with umlaut versus letters with tréma, *not* their
appearance. The fact that some minority decides to then also
display a tréma with a slightly different form than umlaut is
besides the point, and does not reflect majority practice even
in the German bibliographic data.
Furthermore, the recommended sequences here are, I reiterate:
<BASE, COMBINING DIAERESIS> (for umlaut)
<BASE, CGJ, COMBINING DIAERESIS> (for tréma)
The CGJ is *not* applied to the diaeresis character -- it is first
in the sequence, right after the base letter.
The following sequence *is* an allowed one for a variation selector:
<BASE, VAR1, combining-mark>
as long as the sequence <BASE, VAR1> has *explicitly* been standardized
as representing a distinct, graphical variant of <BASE>. If I then
apply one or more combining marks to that sequence, there is not a
problem.
However, such usage defines a variant of the base, not a variant of
the combining mark itself.
> One
> such situation is Holam Male which never takes an additional combining
> mark*. So why can't we represent it as <VAV, HOLAM, variation selector>?
Because the UTC has ruled out <CM, VAR> as interpretable sequences.
> After all in practice there is no normalisation problem with this. (By
> the way, I am proposing as one option <VAV, variation selector, HOLAM>,
> but that has been opposed on the debatable grounds that what changes is
> not the VAV but the HOLAM - the best description is that the whole
> grapheme cluster changes.)
I don't have a quarrel with describing things that way -- but you
just can't get from here to there with variation selectors.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Jul 14 2004 - 17:11:11 CDT