Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks

From: Doug Ewell (
Date: Wed Jul 14 2004 - 15:52:37 CDT

  • Next message: Peter Kirk: "Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks"

    Peter Kirk <peterkirk at qaya dot org> wrote:

    > It seems to me that this solution will also "result in massive data
    > representation ambiguities for German data" (quote from N2819).

    It's not German data (with umlauts) that will be affected by this
    solution, but non-German data (with diaereses) in German bibliographic
    systems. That makes it a much smaller problem.

    > N2819 does not deal with the issue of how to encode a base character
    > (X) plus tréma and another combining mark (M). Should this be <X, M,
    > CGJ, COMBINING DIAERESIS, M>? How is this issue affected by whether
    > the combining class of M is less than, equal to or greater than that
    > of COMBINING DIAERESIS? How do these sequences behave when normalised?
    > The distinction is not necessarily theoretical because in some
    > languages (certainly in Greek although I guess there is no ambiguity
    > with umlaut there) a diaeresis indicating separation can co-occur with
    > other accents. The German bibliographers need guidance on how to
    > convert such combinations to Unicode while preserving the distinction
    > from umlaut.

    The DIN request and the USNB solution didn't address this, because the
    problem to be solved was disambiguating {a, o, u}-with-tréma from {a, o,
    u}-with-umlaut. If there are combinations of (for example)
    a-with-tréma-and-something-else AND ALSO
    a-with-umlaut-and-something-else, then those two will need to be
    disambiguated somehow. But I strongly doubt that the latter case exists
    in German bibliographic data, though of course one never knows.

    > No, I have attempted to deal with these issues, in the old thread on
    > "Variation selectors and vowel marks", and have described in some
    > detail what might be done in situations where the modified combining
    > mark and another mark are on the same base character. I accept that I
    > did not find a fully satisfactory solution, but I certainly did not
    > ignore the problem. But the umlaut/tréma proposal fails to discuss
    > this problem at all and so can reasonably be accused of ignoring it.

    It was out of scope.

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Wed Jul 14 2004 - 15:56:04 CDT