Re: U+0140 Catalan middle-dot

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Apr 15 2004 - 21:16:23 EDT

  • Next message: Mark E. Shoulson: "Re: U+0140"

    From: "Patrick Andries" <Patrick.Andries@xcential.com>
    > Philippe Verdy a écrit :
    > >From: "Patrick Andries" <Patrick.Andries@xcential.com>
    > >>Peter Kirk a écrit :
    > >>>What is U+2027 intended for? The name suggests that it might be what
    > >>>is needed for Catalan.
    > >>>[PA] Isn't this the one that should be used in dictionaries ?
    > >>>
    > >>See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html
    > >>2027
    > >>HYPHENATION POINT
    > >>Hyphenation point is primarily used to visibly indicate syllabification
    > >>of words. Syllable breaks are potential line breaking opportunities in
    > >>the middle of words. The hyphenation point It is mainly used in
    > >>dictionaries and similar works. When an actual line break falls inside a
    > >>word containing hyphenation point characters, the hyphenation point is
    > >>rendered as a regular hyphen at the end of the line.
    > >
    > >This last sentence is wrong, at least in my Larousse dictionnaries:
    > >
    > I believe it simply describes certain practices (Anglo-Saxon, American
    > ?), maybe this should be clearer.

    This just demonstrate that the "only one dot character fits all" strategy is too
    simplist. There are atual usages in such serious publications as very common
    dictionnaries, of multiple dots which have their own semantics and rendering
    particularities.

    The Catalan middle-dot is a plain orthographic letter and should be treated as
    such, and not by borrowing a punctuation sign or symbol which may have other
    conflicting uses. What I suggested is that the general category, despite its
    weak definition, is still a good indicator of which character to use.

    So U+2027 (as well as the U+013F middle-dot found in ISO-8859-1/15) is not the
    exact character to represent this middle dot in all usages, even if there's a
    important legacy history of using the ISO-8859-1 middle-dot in Catalan (or a
    legacy use of L-middle-dot in ISO 6937 which was defined just for convenience
    with older technologies that could not display acceptably the sequence <L,
    middle-dot, L> in Catalan due to the excessive space. So a ligature was probably
    preferable in the Videotex context.) My opinion is that U+2027 already meant in
    Teletext or Videotex two abstract characters even for Catalan readers (and this
    can explain why there's a compatibility decomposition, as a legacy acceptable
    but poor fallback).

    The other reason is that the middle-dot, being a punctuation, would be likely to
    have extra spacing on both sides, which would make it inappropriate for
    rendering Catalan words. Also such punctuation would probably forbid kerning of
    the middle-dot within the open area of a uppercase L, something which would be
    acceptable for reading Catalan (as it was acceptable with U+2027 in
    Teletext/Videotex).

    I looked for handwritten forms of two lowercase l with an intermediate middle
    dot and it clearly shows that Catalan write them without extra spacing: the dot
    fits well within the open area between the connecting baseline and the two
    ascending loops (and sometimes it appears as a horizontal or slanted medial
    stroke that connect the two loops, or as a ligature of the two lowercase l
    letters, or the dot is put within the ascending loop of the first l). I don't
    know which form the Catalan children learn at school to write correctly the
    three letters, or if they are taught whever this dot is a diacritic or a special
    hyphen...
    My readings only show that there's no such L-with middle-dot in the Catalan
    alphabet, and it is not most often considered as a letter despite it represents
    a distinctive sound.

    An interesting article about Catalan typesetting with TeX is on:
    http://www.tug.org/TUGboat/Articles/tb16-3/tb48vali.pdf

    * It is noted that the usual middle dot (which normally appears at half the
    baseline and the x-height) is not exactly what is needed for catalan (where it
    should be placed at half the current height of the current middle-dot and the
    ascender height).
    Another feature is that the dot should be at equal distance of the two vertical
    stems of lowercase or uppercase L, which keep their normal distance that would
    be used in absence of this dot...)
    * So the dot is naturally kerned into the first uppercase L, but usually not
    between lowercase letters where it takes its space within the inter-letter
    spacing.
    * It also discusses the allowed hyphenations and their correct rendering...



    This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 22:03:04 EDT