Re: U+0140

From: Antoine Leca (
Date: Fri Apr 16 2004 - 04:35:35 EDT

  • Next message: Philippe Verdy: "Re: U+0140 Catalan middle-dot"

    On Thursday, April 15, 2004 8:16 PM, Philippe Verdy va escriure:
    > I thought it was already answered in this list by a Catalan speaking
    > contributor: the sequence L+middle-dot in Catalan is NOT a combining
    > sequence.

    No? Then was is it? Looks like very much one, to me.

    > The middle dot in Catalan plays a role similar to an hyphen
    > between syllables, to mark a distinction with words where, for
    > example a double-L would create an alternate reading.

    Yes (although I am not sure we can write "similar to hyphens", since I do
    not know the history of the hyphen).

    > The dot indicates that each L must be read distinctly (or read
    > with a long or emphatic L).

    Ought to. I.e., it would be precious prononciation, at least for the
    Barcelonian way of speaking. In other places, the prolongated prononciation
    may be the default for litterate speech, too (this is the case here in
    Valencia). Colloquial speech definitively makes no difference between l·l
    and l.

    The very reason for the dot is to disambiguate between two identical
    orthographies inherited from the past, without actually changing the
    orthographies (i.e., dropping one l, or adopting the standard but bulky "tl"
    So, "ll" now unambiguously designs palatal l (the IPA code of which I am
    presently unable to found in Unicode, it is a turned y), coming form
    colloquial words, while "l·l" unambiguously designs may-be-prolongated [l]
    directly coming from Latin. Before the reform (~100 years ago), both were
    written identically, which leads to problems.

    > In French for example we have words like "maille" to be read as
    > /maj/, and the same "-ill-" written diphtongs after another vowel
    > occur in Catalan.

    It is written -i- (not ï nor í), occuring after some vowel. Like "mai"
    (never), which is sounded the same as "maille" in Parisian French.

    > But French will not write "-ill-" if it occurs
    > between two vowels where the two L must have the sound L (if this
    > occurs in french, only 1 L is written, and the emphatic/long sound is
    > not marked).

    Of course not "-ill-" (why on earth someone will introduce an -i- where
    there is no reason for it?), but rather "-ll-", like in "collège" or
    "parallèle". TWO L's ;-). This is after the two most used words in Catalan
    that have the ·, namely "col·legi" and "paral·lel".

    And yes, similarly to Catalan, the emphatic/prolongated l sound is not
    usualy marked.

    > Catalan has this orthograph, and writes the
    > emphatic/long L distinctly. So it needs a symbol for that. The
    > middle-dot is then considered in Catalan as a letter,

    This is not a letter. Not as much as harly anyone will consider apostrophe
    as being a letter in Romance languages (or in English either).
    Note that I am _not_ saying · is like an apostrophe in Catalan (the latter
    is a punctation symbol, which separates words). But it is not a letter.
    Neither are ´ or ¸, either.

    > that will occur in the middle of words.

    Specifically between L (either lower or upper-case, but not a mixture).
    There are other rules, too, such as IIRC the letters surrounding the l
    should be vowels (Not 100% sure here, and did not care to check).

    > I don't know if the middle-dot can be used in Catalan as a cadidate
    > position for a line break with hyphenation:

    It is.

    > if yes, is it kept before
    > the hyphen, or is the middle-dot used alone, or is the middle-dot
    > replaced by a regular hyphen?

    The latter.

    > I don't know. But if the middle-dot
    > must be replaced by a hyphen, then it is a punctuation (similar to
    > hyphens used in compound-words).

    What is the first k in a hyphenated "dicke" in German? (it becomes
    "dik-ke"). At any rate, I will not tag it as "punctuation"!
    Here we are a similar case: when l·l is hyphenated, the former "diglyph",
    i.e. "l·", is transformed to "l". The obvious reason is that there is no
    more need to disambiguate, since a palatized "ll" will never be hyphenated
    in Catalan (nor in Castilian, nor will "lh" in Portuguese or Occitan, nor
    will "gli" in Italian).

    > But in Catalan, the middle dot should not be kerned into the
    > preceding uppercase L, like it would appear if it was considered
    > equivalent to <L-middle-dot>.

    Sorry, but who are you to dictate laws about kerning in Catalan?
    Kerning is essentially an optional feature related to fonts, and I do not
    see any reason to avoid "kerning" a L and a · (which would be in a title,
    moreover), if the result is aesthetically unpleasant, perhaps becasue the
    font designer did not consider the case.

    > If there's something really missing for Catalan, it's a middle-dot
    > letter with general category "Lo", and combining class 0 (i.e. NOT
    > combining). It's unfortunate that almost all legacy Catalan text
    > transcoded to Unicode are based on the middle-dot symbol (the one
    > mapped in ISO-8859-1 and ISO-8859-15) which is not seen by Unicode as
    > a letter (Lo) but as a symbol only.

    Considered that the · is present on any Spanish keyboard these days (shift
    3), and that on the other hand almost no keyboard except ancient typewriters
    do have the [L·] key, it is very fortunate, in fact. Because this way we
    have some unity in the electronic corpus. Particularly since people are
    progressively using it.

    The fact that U+00B7 is not considered as Lo or something similar, and hence
    is "vetoed" to enter in identifiers and similar cases for un-Catalanized
    softwares, is a fact well known by the UTC, and Dr. Whistler in particular
    ;-). And it is this way for many many years now. All Catalans sympathisants
    we can see this matter of fact as unfortunate, but since Mathematicians seem
    to outweight Catalans as users of Unicode, and since · is widely seen as an
    operator by these base, there is no chance to modify it.
    Nor is there any chance to succeed in modifying the use of U+00B7 for · in
    Catalan (since the only practical substitutes will be . and -, which are
    clearly worse; any other "new idea" will not be used, so is deadborn).

    Hope it helps,


    This archive was generated by hypermail 2.1.5 : Fri Apr 16 2004 - 06:23:49 EDT