From: Doug Ewell (dewell@adelphia.net)
Date: Wed Jul 14 2004 - 15:52:37 CDT
Peter Kirk <peterkirk at qaya dot org> wrote:
> It seems to me that this solution will also "result in massive data
> representation ambiguities for German data" (quote from N2819).
It's not German data (with umlauts) that will be affected by this
solution, but non-German data (with diaereses) in German bibliographic
systems. That makes it a much smaller problem.
> N2819 does not deal with the issue of how to encode a base character
> (X) plus tréma and another combining mark (M). Should this be <X, M,
> CGJ, COMBINING DIAERESIS>, or <X, CGJ, M, COMBINING DIAERESIS>, or <X,
> CGJ, COMBINING DIAERESIS, M>? How is this issue affected by whether
> the combining class of M is less than, equal to or greater than that
> of COMBINING DIAERESIS? How do these sequences behave when normalised?
> The distinction is not necessarily theoretical because in some
> languages (certainly in Greek although I guess there is no ambiguity
> with umlaut there) a diaeresis indicating separation can co-occur with
> other accents. The German bibliographers need guidance on how to
> convert such combinations to Unicode while preserving the distinction
> from umlaut.
The DIN request and the USNB solution didn't address this, because the
problem to be solved was disambiguating {a, o, u}-with-tréma from {a, o,
u}-with-umlaut. If there are combinations of (for example)
a-with-tréma-and-something-else AND ALSO
a-with-umlaut-and-something-else, then those two will need to be
disambiguated somehow. But I strongly doubt that the latter case exists
in German bibliographic data, though of course one never knows.
> No, I have attempted to deal with these issues, in the old thread on
> "Variation selectors and vowel marks", and have described in some
> detail what might be done in situations where the modified combining
> mark and another mark are on the same base character. I accept that I
> did not find a fully satisfactory solution, but I certainly did not
> ignore the problem. But the umlaut/tréma proposal fails to discuss
> this problem at all and so can reasonably be accused of ignoring it.
It was out of scope.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Wed Jul 14 2004 - 15:56:04 CDT