Re: Umlaut and Tréma, was: Variation selectors and vowel marks

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Jul 25 2004 - 12:59:28 CDT

Next message: Alain LaBonté: "RE: Much better Latin-1 keyboard for Windows"

Previous message: Cristian Secarã: "Re: Much better Latin-1 keyboard for Windows"
In reply to: busmanus: "Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks"
Next in thread: Doug Ewell: "Re: Umlaut and TrÃ©ma, was: Variation sele ctors and vowel marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "busmanus" <busmanus.lk@freemail.hu>
> I am not sure about the relevance of the Meteg problem, but I do know
> about a case, where different relative positions of the same
> diacriticals are used for conveying a semantic distinction. In a big
> reference work about verse metrics in the world's languages (Erika
> Szepes - István Szerdahelyi: Verstan, published by Gondolat, Budapest,
> 1981), when discussing quantitative metrics, a macron above a breve is
> used for denoting a neutral syllable of the metrical pattern that is
> more frequently filled in by a short syllable than by a long one and
> a breve above a macron is used for the reverse, i.e. the difference in
> the combinations provides statistical information.
>
> Actually, these signs are typically (although not inevitably) spacing
> characters, but I don't think it makes a significant difference in this
> perspective.

When the relative ordering of diacritics becomes significant, but they have
the same non-zero combining class, Unicode already has all the features
needed to preserve both the logical/semantic and graphical distinction,
because this relative order is preserved.

However, this relative order does not specify how these diacritics stack on
the base letter. In your example with macron and breve, they both share a
"above" combining class, and generally most renderers will stack them
vertically, with the first above-diacritic centered below the second
above-diacritic.

(Some fonts or renderers could rather render them side-by-side, with the
first diacritic on the starting side for the the current writing direction,
and the second diacritic on the ending side; this is another stylistic
option, which would preserve visually the semantic distinction, so this does
not change the problem, and not a problem of Unicode itself; this case would
happen most probably with Semitic scripts, or with Asian texts written
vertically).

The only problem will happen if the semantic distinction cannot be rendered
visually, because the diacritics share the same combining class (so the same
logical "position"), but not the same visual position (in some cases, even
in the Latin script, some above-diacritics are sometimes rendered on the
right side rather than above.)

And we have some cases where a below-diacritic like a cedilla is preferably
shown above-left, where it could compete with another diacritic. This is
probably a pedantic theorical case where the default Unicode combining
classes are inappropriate to represent correctly the interaction between
diacritics.

For these reasons, I really suggest to keep CGJ as a way to encode and force
the relative order of diacritics, and forget any other use of CGJ for
something else than encoding a logical relative order of distinct logical
pairs of diacritics which would otherwise become reordered identically,
breaking the semantic of the text.

I strongly suggest that CGJ not being used for something else than forcing
the relative order of combining characters (and as a consequence, allowing
CGJ only between two combining characters, but not just before or after a
base character; should these two sequences be acceptable, as they are
already valid in Unicode, they will represent distinct semantics for the
base character of the combining sequence).

As a consequence, CGJ will be inappropriate to encode a logical/semantic
difference between umlaut and tréma for example (and the special treatment
of umlaut versus tréma/diaeresis in German, or of the accute accent in
Polish, for collation purpose makes CGJ inappropriate for encoding these
logical distinctions...)

Then, the problem remains: how can we encode logical/semantic distinctions
of diacritics which have been unified in Unicode, but are clearly not
unified in some languages (German and Polish are such examples...)????

The existing variation selectors VS1..VS256 are not an option here (as they
are breaking default grapheme clusters, meaning lots of troubles for text
editors or text selection). Isn't it a place where we would really need some
combining variation selectors (CVS1..CVS16 at least), to be used in
applications or texts that need such distinctions?

Next message: Alain LaBonté: "RE: Much better Latin-1 keyboard for Windows"
Previous message: Cristian Secarã: "Re: Much better Latin-1 keyboard for Windows"
In reply to: busmanus: "Re: Umlaut and Tréma, was: Variation sele ctors and vowel marks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Jul 25 2004 - 13:03:09 CDT