From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Apr 16 2004 - 06:37:46 EDT
From: "Antoine Leca" <Antoine10646@leca-marti.org>
> On Thursday, April 15, 2004 8:16 PM, Philippe Verdy va escriure:
> > I thought it was already answered in this list by a Catalan speaking
> > contributor: the sequence L+middle-dot in Catalan is NOT a combining
> > sequence.
>
> No? Then was is it? Looks like very much one, to me.
It is more exactly a ligature, not a combining sequence. But the second
character of the ligature works more like a diacritic, and not as a separate
punctuation or symbol.
In some future, we could see U+013F and U+0140 used more often than L or l plus
U+00B7... Notably in word processors that can detect these sequences in Catalan
text and substitute them with the ligatures, which create a more acceptable
letter form and allows easier text handling for (e.g.) word selection in user
interfaces and dictionnary lookups.
The fact that there's no such L-middle-dot on keyboards should not be a limit:
word processors have more key bindings and more intelligence than the default
keys found on keyboards.
When I see a Catalan word coded with <L, U+00B7, L> it looks very ugly (notably
with monospaced fonts or in Teletext) and I'm sure that Catalan readers don't
like the default presentation. They will much appreciate the support for the
ligated <U+013F or U+0140, L> encodings. I don't think they can be considered
"compatibility characters" just introduced for compatibility with a past ISO
standard for Videotex and Telelext.
The compatibility decompositions in the UCD are bad suggestions (only fallbacks)
which create problems that did not exist in the Videotex standard (they already
create a problem for internationalized domain names). But now that decomposition
are normative, there's no way to change it in Unicode.
The only safe way to change things would then be to have a middle-dot diacritic
(combining but with combining class 0) to be used instead of U+00B7, even if
there's no canonical equivalence with the U+013F and U+0140 ligatures... A
Catalan keyboard would then return this new dot instead of U+00B7, and word
processors or input method editors would easily find a way to represent it using
the ligature when it follows a L. If such character was added, I would give it
the general category "Mn", a combining class 0, to match linguistic
expectations, and it would work with IRI and IDN as well, and would immediately
work with all basic Unicode text processing without needing an exception for
Catalan. This new character could have a compatibility decomposition into U+00B7
only as a fallback; and the existing ligatures U+013F and U+0140 could be
commented by providing a better decomposition with this new character, than the
compatibility decompositions with U+00B7.
This archive was generated by hypermail 2.1.5 : Fri Apr 16 2004 - 07:19:07 EDT