Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks

From: Peter Kirk (
Date: Tue Jul 13 2004 - 17:15:42 CDT

  • Next message: saqqara: "Re: June meeting minutes?"

    On 13/07/2004 20:02, Asmus Freytag wrote:

    > At 11:02 AM 7/13/2004, Peter Kirk wrote:
    >> I was surprised to see that WG2 has accepted a proposal made by the
    >> US National Body to use CGJ to distinguish between Umlaut and Tréma
    >> in German bibliographic data.
    > You raise some interesting questions. However, note that the purpose
    > of CGJ is intended for sorting related distinctions, which are at
    > issue here. This is different from variation selectors which are
    > intended to be used for displayed variations.

    OK. But this is not a unique case. For example, in Hebrew Silluq and
    Meteg, Dagesh and Shuruq are pairs of different marks which share a
    glyph and so a Unicode character but may need to be distinguished for
    certain processes. Should similar encodings with CGJ be proposed to make
    these distinctions? For that matter, what if in a certain (hypothetical)
    language consonant Y and vowel Y should be collated differently? Would
    that justify an endoing of one of them with CGJ? But then these are not
    combining characters in the first place. So I must agree with Doug that
    "CGJ + COMBINING DIAERESIS is a hack".

    On 13/07/2004 19:35, Doug Ewell wrote:

    > ...
    >The alternative proposed by DIN, creating a new COMBINING UMLAUT
    >character, would have caused *unprecedented and catastrophic*
    >equivalence and normalization problems.
    Understood. But I can argue in the same way that creating a new RIGHT
    HOLAM character for Holam Male would cause *catastrophic* equivalence
    and normalisation problems, although no longer unprecedented because we
    have the umlaut/tréma precedent. The situation is really very similar:
    two combining marks which are not distinguished in most modern
    typography, but which are distinguished graphically in some typefaces
    (if I remember correctly, in Fraktur as well as in the typefaces
    mentioned in Victor Gaultney's paper); and which have distinct
    interpretations and are distinguished in some existing data in which the
    distinction is important; but which should not be split into separate
    characters because this would seriously destabilise the majority of
    existing data in the script which does not make the distinction.

    What many people are telling me to do with Holam Male (e.g. Less
    Preferred Option 4 in
    is equivalent to the following solution to the umlaut/tréma problem:
    define a new tréma character, or perhaps new umlaut and tréma
    characters, to be used only in the German bibliographic data, and ignore
    the problem that this makes the bibliographic data incompatible with all
    other German text, and unable to be displayed by existing fonts until
    they get round to adding the new characters - as well as ignoring the
    problem that the precomposed characters have the wrong decomposition.
    (The Hebrew equivalent to this is that U+FB4B should decompose to Holam
    Male not Vav Haluma.) If that solution was not acceptable for German,
    why should it be acceptable for Hebrew?

    >>It seems to me that the UTC should bite the bullet and accept that
    >>there is a need for variation sequences for combining marks, and
    >>either adjust the definitions of existing variation selectors or
    >>encode new specialised variation selectors for them. The adjusted or
    >>new variation selectors can then be used for Hebrew as well as for
    >>German - see my posting on this subject to the Hebrew list.
    >"When 256 variation selectors just won't do, invent another."
    >(with apologies to Ken Whistler)

    256 variation selectors won't do if they have all been defined
    unchangeably with the wrong properties e.g combining class. On the other
    hand, if the UTC is prepared to ignore the combining class and
    normalisation problems involved in using one combining class zero
    character, CGJ, to modify a combining mark, it may as well ignore the
    identical problems involved in using variation selectors, also combining
    class zero, with combining marks.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Tue Jul 13 2004 - 17:17:00 CDT