Re: U+0140

From: Antoine Leca
Date: Sun Apr 18 2004 - 16:45:00 EDT

    On Saturday, April 17, 2004 10:28 PM TU+1, António Martins-Tuválkin wrote:
    >> As I wrote earlier, if you know the text under inspection is
    >> Catalan, a very simple regular expression will deal with that. Any
    >> half-decent Catalan word processor do it already, by the way.
    > What about the odd Catalan phrase within a text in Guarani or
    > Cherokee?

    Then, you do not know the text under inspection is Catalan, the "if" is not
    asserted, so you are not supposed to act accordingly. That is, nobody will
    beg you because a double click on col·legi does not select the whole word;
    and any reader can test its own word processor, please double click the
    Catalan word before, and test if it is recognized as such, even if
    surrounded by bad English instead of Guarani!

    > Unicode, do not forget, supposedly brings correctness to
    > multilingual text...

    And then?
    Would you try to say that selecting word in multilingual text should always
    do the "right thing"? You were merely dreaming, I believe; and you know it
    perfectly; having posting less than 2 minutes ago the case of apostrophes,
    which is about impossible to sort out in the average multilingual text.
    Furthermore, what is "the right thing" varies from people to people, so
    achieving perfection here is a mere dream.

    Or are you trying to make the point that inventing a new point for · in
    Catalan would bring any added correctness to multilingual texts?

    It is certain that the compatibility encoding of U+0140 is not very welcome
    from my eyes, since:
     - it is almost unused, but for the case it might be, informaticians like me
    do have to check for it: so it is just a waste of my time, I would say :-(
     - one that reads TUS and does not know Spanish uses at the respect, might
    think that col·legi should be written coŀlegi, "co\u0140legi", because the
    former is not listed as a letter, and only the latter references itself as
    "Catalan", without mentionning the "right thing to do"
     - the only advantage I am able to see, namely that the typographers will
    design the mid dot raised in U+0140 relative to the position it has in
    U+00B7, is not exploited in practice; we even see a lot of fonts where the
    dot in U+0140 is not balanced between the l, which clearly show that the
    majority of typographers have no idea about the use of this character, and
    they probably merely build it a compound of U+006C and U+00B7... Others use
    a reduced size for the dot in U+0140 (which is unpleasing to my eyes). Only
    a few fonts do provide U+0140 with a reduced width for the dot, which might
    be considered good typography.

    Further note about typography: I have compared on some (widely available)
    fonts the layout of ŀl versus l·l and also the upper dot of the colon. I
    found that almost nobody use the upper dot of the colon. One of the few I
    found, namely Linotype Palatino (I cite it since I generally consider it a
    nice design), does use the upper dot of the colon for ŀ. And the result is
    really ugly, because the dot is way too high (about 65% of l-height), thanks
    to the modern habbit of the higher x-heights...


