Re: Umlaut and Tréma, was: Variation selectors and vowel marks

From: Asmus Freytag (
Date: Wed Jul 14 2004 - 23:00:17 CDT

  • Next message: Peter Kirk: "Re: Umlaut and Trma, was: Variation selectors and vowel marks"

    At 01:52 PM 7/14/2004, Doug Ewell wrote:
    >It's not German data (with umlauts) that will be affected by this
    >solution, but non-German data (with diaereses) in German bibliographic
    >systems. That makes it a much smaller problem.

    the use of diaeresis is perfectly valid for words in fields that have a
    language ID 'German'.

    >The DIN request and the USNB solution didn't address this, because the
    >problem to be solved was disambiguating {a, o, u}-with-tréma from {a, o,
    >u}-with-umlaut. If there are combinations of (for example)
    >a-with-tréma-and-something-else AND ALSO
    >a-with-umlaut-and-something-else, then those two will need to be
    >disambiguated somehow. But I strongly doubt that the latter case exists
    >in German bibliographic data, though of course one never knows.

    First off, there have to be corresponding entries in the sorting tables
    used for such data, to make that distinction have the correct effect. Since
    the sorting tables would not support anything ohter than <BASE, CGJ,
    DIAERESIS> there's no reason to introduce other sequences into the data.

    Secondly, the dieresis is used to indicate that two vowels are pronounced
    separately. I haven't seen a case where the vowels would already be accented.

    Finally, one of the additional reasons that the phonetic sorting is
    relevant in this instance, other than that the pronunciations are in fact
    different, is that the use of diaeresis is not mandatory to the same degree
    as for umlauts. You can find Kapernaum spelled with and without it, but if
    you spell Hauser with it, it's the plural of Haus, without it it's a name.
    Personal names however, sometimes are spelled with vowel + e (Moeller).

    By sorting dieresis as a secondary difference, related terms do sort
    together, and names sort near their variant spellings. The suggested
    approach solves the problem at hand for those data where somebody took the
    trouble to decide (on input) which was which, so that huge catalogs of
    subject keywords or authors come out correctly.

    Note, the bulk of all possible data in German won't make that distinction,
    and won't be used on systems that support the special sorting method.


    This archive was generated by hypermail 2.1.5 : Wed Jul 14 2004 - 23:01:49 CDT