Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jul 14 2004 - 17:10:20 CDT

  • Next message: Peter Kirk: "Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks"

    Peter Kirk continued:

    > I did read it, but it didn't deal with the issue I was concerned about,
    > of multiple combining marks. And I was concerned about that issue
    > because that was the major concern expressed in the earlier discussion
    > on variation selectors, and presented as the decisive reason why
    > variation selectors cannot be used with combining marks.

    And I agree that that is the (or at least "a") decisive reason
    why variation selectors cannot be used with combining marks.

    In other words, once you try to define <CM, VAR1> as being a
    variant form of the combining mark in question, you start
    getting into trouble whenever you try to add another combining
    mark after it in sequence.

    >
    > If CGJ can be used with combining marks in situations where (as far as
    > we know) there is in fact no problem with multiple combining marks, what
    > is to stop variation selectors being used in the same situations?

    Because the situations are different.

    Apparently you are not grokking this.

    The umlaut/tréma case is one of distinguishing the *collation* order
    of letters with umlaut versus letters with tréma, *not* their
    appearance. The fact that some minority decides to then also
    display a tréma with a slightly different form than umlaut is
    besides the point, and does not reflect majority practice even
    in the German bibliographic data.

    Furthermore, the recommended sequences here are, I reiterate:

    <BASE, COMBINING DIAERESIS> (for umlaut)
    <BASE, CGJ, COMBINING DIAERESIS> (for tréma)

    The CGJ is *not* applied to the diaeresis character -- it is first
    in the sequence, right after the base letter.

    The following sequence *is* an allowed one for a variation selector:

    <BASE, VAR1, combining-mark>

    as long as the sequence <BASE, VAR1> has *explicitly* been standardized
    as representing a distinct, graphical variant of <BASE>. If I then
    apply one or more combining marks to that sequence, there is not a
    problem.

    However, such usage defines a variant of the base, not a variant of
    the combining mark itself.

    > One
    > such situation is Holam Male which never takes an additional combining
    > mark*. So why can't we represent it as <VAV, HOLAM, variation selector>?

    Because the UTC has ruled out <CM, VAR> as interpretable sequences.

    > After all in practice there is no normalisation problem with this. (By
    > the way, I am proposing as one option <VAV, variation selector, HOLAM>,
    > but that has been opposed on the debatable grounds that what changes is
    > not the VAV but the HOLAM - the best description is that the whole
    > grapheme cluster changes.)

    I don't have a quarrel with describing things that way -- but you
    just can't get from here to there with variation selectors.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Jul 14 2004 - 17:11:11 CDT