Re: U+0140

From: Philippe Verdy (
Date: Fri Apr 16 2004 - 06:37:46 EDT

  • Next message: Philippe Verdy: "Re: U+0140"

    From: "Antoine Leca" <>
    > On Thursday, April 15, 2004 8:16 PM, Philippe Verdy va escriure:
    > > I thought it was already answered in this list by a Catalan speaking
    > > contributor: the sequence L+middle-dot in Catalan is NOT a combining
    > > sequence.
    > No? Then was is it? Looks like very much one, to me.

    It is more exactly a ligature, not a combining sequence. But the second
    character of the ligature works more like a diacritic, and not as a separate
    punctuation or symbol.

    In some future, we could see U+013F and U+0140 used more often than L or l plus
    U+00B7... Notably in word processors that can detect these sequences in Catalan
    text and substitute them with the ligatures, which create a more acceptable
    letter form and allows easier text handling for (e.g.) word selection in user
    interfaces and dictionnary lookups.

    The fact that there's no such L-middle-dot on keyboards should not be a limit:
    word processors have more key bindings and more intelligence than the default
    keys found on keyboards.

    When I see a Catalan word coded with <L, U+00B7, L> it looks very ugly (notably
    with monospaced fonts or in Teletext) and I'm sure that Catalan readers don't
    like the default presentation. They will much appreciate the support for the
    ligated <U+013F or U+0140, L> encodings. I don't think they can be considered
    "compatibility characters" just introduced for compatibility with a past ISO
    standard for Videotex and Telelext.

    The compatibility decompositions in the UCD are bad suggestions (only fallbacks)
    which create problems that did not exist in the Videotex standard (they already
    create a problem for internationalized domain names). But now that decomposition
    are normative, there's no way to change it in Unicode.

    The only safe way to change things would then be to have a middle-dot diacritic
    (combining but with combining class 0) to be used instead of U+00B7, even if
    there's no canonical equivalence with the U+013F and U+0140 ligatures... A
    Catalan keyboard would then return this new dot instead of U+00B7, and word
    processors or input method editors would easily find a way to represent it using
    the ligature when it follows a L. If such character was added, I would give it
    the general category "Mn", a combining class 0, to match linguistic
    expectations, and it would work with IRI and IDN as well, and would immediately
    work with all basic Unicode text processing without needing an exception for
    Catalan. This new character could have a compatibility decomposition into U+00B7
    only as a fallback; and the existing ligatures U+013F and U+0140 could be
    commented by providing a better decomposition with this new character, than the
    compatibility decompositions with U+00B7.

    This archive was generated by hypermail 2.1.5 : Fri Apr 16 2004 - 07:19:07 EDT