Re: U+0140 Catalan middle-dot

From: Philippe Verdy (
Date: Fri Apr 16 2004 - 06:11:00 EDT

  • Next message: Philippe Verdy: "Re: U+0140"

    From: "Peter Kirk" <>
    > On 15/04/2004 18:16, Philippe Verdy wrote:
    > >So U+2027 (as well as the U+013F middle-dot found in ISO-8859-1/15) is not
    > >exact character to represent this middle dot in all usages, ...
    > Philippe, before jumping to this conclusion, please can you describe to
    > me EXACTLY how the shape and behaviour of the Catalan middle dot differs
    > from the behaviour of U+2027 defined in Unicode Standard Annex #14,
    > > 2027
    > > A hyphenation point is a raised dot, which is used primarily to
    > > visibly indicate syllabification of words. Syllable breaks are
    > > potential line break opportunities in the middle of words. It is
    > > mainly used in dictionaries and similar works. When an actual line
    > > break falls inside a word containing hyphenation point characters, the
    > > hyphenation point is rendered as a regular hyphen at the end of the line.
    > >
    > From the descriptions which you and Anto'nio have provided and from
    >, it seems to me
    > that the Catalan behaviour is exactly as described for U+2027 in USA
    > #14, perhaps because the Catalan usage has been borrowed from dictionary
    > usage or vice versa. This strongly suggests that U+2027 is the
    > appropriate character for Catalan.

    Did you read this PDF seriously: it really discusses about a hack needed to
    reposition the middle-dot correctly so that the Catalan dot will:
    - not alter the interletter space
    - will be drawn on a higher position (approximately at the x-height) than
    middle-dot (in the middle of the x-height and baseline), with a horizontal
    position that centers it between the vertical stems of the two surrounding l or
    L (this makes a difference for the uppercase letter).

    So the encoded l-with-middle-dot and L-with-middle-dot, if properly created for
    Catalan using these guidelines, will render much better than 'L' or 'l' followed
    by U+00B7 and even better than U+2027.

    If rendering is not important for you (it matters when one wants to create a
    renderer), consider the case of collation, and text analysis. My view about the
    precombined ligatures L-with-middle-dot is that their "letter" general category
    makes things easier for writers and readers, even if both agree that there's no
    such dotted-L letter in Catalan, but clearly a single L with an additional but
    separate phonetic mark.

    Another point: the middle dot in Catalan seems to be used only between a pair of
    L letters. Typographers consider the double L with a middle-dot as a ligature,
    and Catalan phonetic uses a dotted pair to change the phonetic (and even the
    meaning) of a double-L from the "L mouillé" (where it is pronounced like y
    between vowels), to a consonantal palatal L.

    Last note: Catalan words starting by a double-L exist, but they apparently never
    take a middle dot (because such orthograph always designates a consonnantal
    palatal L, sometimes pronounced with some stress or with a audible
    palato-lingual click or some prenasalisation; this pronounciation depends on the
    4 local dialects spoken)

    The phonetic distinction of medial double-L did not exist in medieval Catalan
    texts where this mark was not written (like in French). The Catalan middle-dot
    was then introduced later with a clear intent to not alter the number of letters
    and their relative positions in the typography. Most modern text renderers on
    computers display the 00B7 incorrectly for Catalan (notably in user interfaces
    and in web browsers).

    So, for a typographic point of view, the U+013F and U+0140 ligatures are much
    better than their compatibility decomposition. I don't think they can be
    described as compatibility characters. So the ISO 6937 standard for Videotex was
    right when it defined this ligature to respect the normal typography, but the
    compatibility decompositions using U+00B7 in Unicode are certainly not the best
    ones (they are widely used today simply because the ligatures were missing in
    ISO-8859-1 and in Windows 1252, and there was no other alternative than using
    U+00B7 for that function).

    This archive was generated by hypermail 2.1.5 : Fri Apr 16 2004 - 06:52:56 EDT