From: Antoine Leca (Antoine10646@leca-marti.org)
Date: Sun Apr 18 2004 - 16:45:00 EDT
On Saturday, April 17, 2004 10:28 PM TU+1, António Martins-Tuválkin wrote:
>> As I wrote earlier, if you know the text under inspection is
>> Catalan, a very simple regular expression will deal with that. Any
>> half-decent Catalan word processor do it already, by the way.
>
> What about the odd Catalan phrase within a text in Guarani or
> Cherokee?
Then, you do not know the text under inspection is Catalan, the "if" is not
asserted, so you are not supposed to act accordingly. That is, nobody will
beg you because a double click on col·legi does not select the whole word;
and any reader can test its own word processor, please double click the
Catalan word before, and test if it is recognized as such, even if
surrounded by bad English instead of Guarani!
> Unicode, do not forget, supposedly brings correctness to
> multilingual text...
And then?
Would you try to say that selecting word in multilingual text should always
do the "right thing"? You were merely dreaming, I believe; and you know it
perfectly; having posting less than 2 minutes ago the case of apostrophes,
which is about impossible to sort out in the average multilingual text.
Furthermore, what is "the right thing" varies from people to people, so
achieving perfection here is a mere dream.
Or are you trying to make the point that inventing a new point for · in
Catalan would bring any added correctness to multilingual texts?
It is certain that the compatibility encoding of U+0140 is not very welcome
from my eyes, since:
- it is almost unused, but for the case it might be, informaticians like me
do have to check for it: so it is just a waste of my time, I would say :-(
- one that reads TUS and does not know Spanish uses at the respect, might
think that col·legi should be written coŀlegi, "co\u0140legi", because the
former is not listed as a letter, and only the latter references itself as
"Catalan", without mentionning the "right thing to do"
- the only advantage I am able to see, namely that the typographers will
design the mid dot raised in U+0140 relative to the position it has in
U+00B7, is not exploited in practice; we even see a lot of fonts where the
dot in U+0140 is not balanced between the l, which clearly show that the
majority of typographers have no idea about the use of this character, and
they probably merely build it a compound of U+006C and U+00B7... Others use
a reduced size for the dot in U+0140 (which is unpleasing to my eyes). Only
a few fonts do provide U+0140 with a reduced width for the dot, which might
be considered good typography.
Further note about typography: I have compared on some (widely available)
fonts the layout of ŀl versus l·l and also the upper dot of the colon. I
found that almost nobody use the upper dot of the colon. One of the few I
found, namely Linotype Palatino (I cite it since I generally consider it a
nice design), does use the upper dot of the colon for ŀ. And the result is
really ugly, because the dot is way too high (about 65% of l-height), thanks
to the modern habbit of the higher x-heights...
Antoine
This archive was generated by hypermail 2.1.5 : Tue Apr 20 2004 - 05:35:38 EDT