From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Jul 15 2004 - 05:42:09 CDT
On 15/07/2004 10:32, Asmus Freytag wrote:
> Nobody doubts that some text exists with multiple accents on vowels.
> Where the vowels are not Latin a,o,u, there is no issue at all, in
> this case, since there are no differences in German sorting for them. ...
Well, yes, but http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2819.pdf, does not
make it clear that the <CGJ, DIAERESIS> sequence is to be used only with
Latin a, o and u; rather it states "<CGJ, [DIAERESIS]> → tréma". Perhaps
the proposal needs modification to make this point clear, if that is the
intention.
> ... Where the vowels are a, o, u, as for the Livonian example you
> cited, it's a matter of the design of the collation table to get the
> correct sorting behavior.
>
> If there is anything in UCA that would make it impossible to design
> correct collation tables for German university libraries, when CGJ is
> used with Trema, but not for umlaut, then you have an issue. At the
> moment, I see lots of speculation, and red herrings (Greek and Coptic,
> indeed!) but no smoking gun.
Greek and Coptic is not irrelevant. First, you did not restrict the set
of base characters when you wrote:
> Secondly, the dieresis is used to indicate that two vowels are
> pronounced separately. I haven't seen a case where the vowels would
> already be accented.
and of course the diaeresis and accent characters used in Greek are the
same ones used in Latin script. Second, N2819 does not make it clear
that the <CGJ, DIAERESIS> sequence is to be used only for Latin script
data. I would expect (someone can check this of course, and without
checking this is indeed speculation) that there is Greek text in German
bibliographic databases in which the Greek diaeresis is represented in
ISO 5426 as tréma rather than umlaut; that would be correct because the
function of Greek diaeresis is separation rather than vowel
modification. And I would expect an implementer reading N2819 to
conclude that all ISO 5426 trémas should be converted to <CGJ,
DIAERESIS> as no mention is made of a restriction to Latin script or to
just a, o and u. So there is a real chance of a conversion program
producing sequences which could confuse normalisation, e.g. <IOTA, CGJ,
DIAERESIS, ACUTE>, although hopefully not <IOTA, ACUTE, CGJ, DIAERESIS>
which might be a real problem.
>
> And yes, the incidence of Livonian data (relative to trema, which is
> rather uncommon relative to umlaut) may be below a threshold where
> providing a support short of the theoretical optimum is a practical
> concern. That decision belongs to the German bibliographers.
>
Well, it seems that we are agreeing that there may be a problem in
theory, and potentially in practice with small amounts of marginal data,
but Unicode is choosing to leave the problem for the specific users of
the sequence to deal with. That is indeed a reasonable approach. But it
was not considered an acceptable one for use of variation selectors with
combining marks, even in a case where there is no valid data which
actually exhibits the normalisation problem.
My concern as always is with the apparent inconsistency of bending the
normal rules or ignoring the normalisation concerns for German while
refusing to do more or less the same for Hebrew. I appreciate that
Germany is a larger and richer country than Israel and so, at least for
commercial interests, its concerns deserve some priority. But that
should not be a reason to reject as invalid or insignificant issues
concerning Hebrew. And the issue of avoiding incompatible representation
of the same data is a real one for Hebrew Holam Male vs. Vav Haluma just
as it is for German umlaut vs. tréma.
I am not actually asking for variation selectors with combining marks
because I realise that the UTC has already made a decision and is
unlikely to reverse it. But I am asking for some flexibility on some of
the principles, of the kind which has been demonstrated with umlaut and
tréma, and also in the Indic scripts proposal under review, in order to
find an acceptable solution to a real problem. That flexibility might
include allowing either <VAV, variation selector, HOLAM> or <VAV, ZWJ,
HOLAM> to represent Holam Male although technically the VAV glyph does
not (usually) change (nor does the HOLAM glyph) and the HOLAM dot does
not ligate with the it, just moves relative to it.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Thu Jul 15 2004 - 05:43:19 CDT