Re: U+0140

From: Antoine Leca (
Date: Fri Apr 16 2004 - 14:34:14 EDT

  • Next message: Alexandros Diamantidis: "GREEK ANO TELEIA (was: Re: U+0140)"

    On Friday, April 16, 2004 12:37 PM, Philippe Verdy va escriure:

    > In some future, we could see U+013F and U+0140 used more often than L
    > or l plus U+00B7...

    I (personally) hope we would not.

    > Notably in word processors that can detect these
    > sequences in Catalan text and substitute them with the ligatures,
    > which create a more acceptable letter form and allows easier text
    > handling for (e.g.) word selection in user interfaces and dictionnary
    > lookups.

    As I wrote earlier, if you know the text under inspection is Catalan, a very
    simple regular expression will deal with that. Any half-decent Catalan word
    processor do it already, by the way.

    > The fact that there's no such L-middle-dot on keyboards should not be
    > a limit: word processors have more key bindings and more intelligence
    > than the default keys found on keyboards.

    Yes yes yes. Particularly when I want to insert afterwards a · between two
    ll, when it appears I missed it on the first shot (yes, it happens). Or when
    I want to remove a superfluous one that I typed by mistake (yes, it happens
    too). With your "intelligence", this latter point will prove being a
    headache: on the first shot, a normal user will place the caret just after
    the dot, and press Rubout. Slurp, the whole U+0140 is swallowed, but usually
    the user will not notice it. So at the second sight (perhaps a lot of time
    after, perhaps after an useless additional printout), she will have to type
    in the first l.

    Intelligent keyboards might be great. But to be so, they have to bring
    *much* added value (like, obviously, to be able to type in a language
    impossible otherwise; or, more simply, to avoid typing every five minutes
    Alt+0156). If they bring only very little value, they are more annoying that
    anything else, particularly when they are non permanent but rather operate
    from time to time. This would be the case here: as Catalan writer, I type
    about texts sometimes in the word processor, where I would be "helped". And
    sometimes in the mail reader, or on the console, where I would not, for
    example because I do not want to wait two full minutes for the whole
    "helpers" to come in everytime I have to type the name of the user of a
    given process...

    > When I see a Catalan word coded with <L, U+00B7, L> it looks very
    > ugly (notably with monospaced fonts or in Teletext) and I'm sure that
    > Catalan readers don't like the default presentation.

    Yes it looks ugly. But this is in fact less ugly for me than seeing l.l or
    l-l. Ugliness is in the eye of the beholder, of course. When you are in the
    habit of seeing about every hour some rendering of l·l, you will not notice
    it. And in fact, I notice more when someone use the kerned version advocated
    by Gabriel Valiente, because nowadays it is unusual. And I certainly would
    not use the kerned version for some institutional version, because I do not
    want to incommodate my readers (this problem showed up about 20 days ago
    between us; and there were no debate).

    > They will much
    > appreciate the support for the ligated <U+013F or U+0140, L>
    > encodings.

    What do you prefer?

      El col·legi Miguel Hernández de Riola?

      El co[]legi Miguel Hernández de Riola?

    ([] is ASCII art for a box, which is how many many people would see any use
    of U+013F...)

    > I don't think they can be considered "compatibility
    > characters" just introduced for compatibility with a past ISO
    > standard for Videotex and Telelext.

    Sorry, you are fighting a lost battle: everyone here do not use them, so all
    the corpus is already encoded without them.
    The mills of Don Quixote are in Mota del Cuervo, it is only about 200 km
    from here, but this is not the Catalan-speaking region ;-).

    > The only safe way to change things would then be to have a middle-dot
    > diacritic (combining but with combining class 0) to be used instead
    > of U+00B7, even if there's no canonical equivalence with the U+013F
    > and U+0140 ligatures... A Catalan keyboard would then return this new
    > dot instead of U+00B7, and word processors or input method editors
    > would easily find a way to represent it using the ligature when it
    > follows a L.

    May I suggest U+1000B7 for this new character?


    This archive was generated by hypermail 2.1.5 : Fri Apr 16 2004 - 17:13:08 EDT