From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Apr 15 2004 - 21:16:23 EDT
From: "Patrick Andries" <Patrick.Andries@xcential.com>
> Philippe Verdy a écrit :
> >From: "Patrick Andries" <Patrick.Andries@xcential.com>
> >>Peter Kirk a écrit :
> >>>What is U+2027 intended for? The name suggests that it might be what
> >>>is needed for Catalan.
> >>>[PA] Isn't this the one that should be used in dictionaries ?
> >>>
> >>See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html
> >>2027
> >>HYPHENATION POINT
> >>Hyphenation point is primarily used to visibly indicate syllabification
> >>of words. Syllable breaks are potential line breaking opportunities in
> >>the middle of words. The hyphenation point It is mainly used in
> >>dictionaries and similar works. When an actual line break falls inside a
> >>word containing hyphenation point characters, the hyphenation point is
> >>rendered as a regular hyphen at the end of the line.
> >
> >This last sentence is wrong, at least in my Larousse dictionnaries:
> >
> I believe it simply describes certain practices (Anglo-Saxon, American
> ?), maybe this should be clearer.
This just demonstrate that the "only one dot character fits all" strategy is too
simplist. There are atual usages in such serious publications as very common
dictionnaries, of multiple dots which have their own semantics and rendering
particularities.
The Catalan middle-dot is a plain orthographic letter and should be treated as
such, and not by borrowing a punctuation sign or symbol which may have other
conflicting uses. What I suggested is that the general category, despite its
weak definition, is still a good indicator of which character to use.
So U+2027 (as well as the U+013F middle-dot found in ISO-8859-1/15) is not the
exact character to represent this middle dot in all usages, even if there's a
important legacy history of using the ISO-8859-1 middle-dot in Catalan (or a
legacy use of L-middle-dot in ISO 6937 which was defined just for convenience
with older technologies that could not display acceptably the sequence <L,
middle-dot, L> in Catalan due to the excessive space. So a ligature was probably
preferable in the Videotex context.) My opinion is that U+2027 already meant in
Teletext or Videotex two abstract characters even for Catalan readers (and this
can explain why there's a compatibility decomposition, as a legacy acceptable
but poor fallback).
The other reason is that the middle-dot, being a punctuation, would be likely to
have extra spacing on both sides, which would make it inappropriate for
rendering Catalan words. Also such punctuation would probably forbid kerning of
the middle-dot within the open area of a uppercase L, something which would be
acceptable for reading Catalan (as it was acceptable with U+2027 in
Teletext/Videotex).
I looked for handwritten forms of two lowercase l with an intermediate middle
dot and it clearly shows that Catalan write them without extra spacing: the dot
fits well within the open area between the connecting baseline and the two
ascending loops (and sometimes it appears as a horizontal or slanted medial
stroke that connect the two loops, or as a ligature of the two lowercase l
letters, or the dot is put within the ascending loop of the first l). I don't
know which form the Catalan children learn at school to write correctly the
three letters, or if they are taught whever this dot is a diacritic or a special
hyphen...
My readings only show that there's no such L-with middle-dot in the Catalan
alphabet, and it is not most often considered as a letter despite it represents
a distinctive sound.
An interesting article about Catalan typesetting with TeX is on:
http://www.tug.org/TUGboat/Articles/tb16-3/tb48vali.pdf
* It is noted that the usual middle dot (which normally appears at half the
baseline and the x-height) is not exactly what is needed for catalan (where it
should be placed at half the current height of the current middle-dot and the
ascender height).
Another feature is that the dot should be at equal distance of the two vertical
stems of lowercase or uppercase L, which keep their normal distance that would
be used in absence of this dot...)
* So the dot is naturally kerned into the first uppercase L, but usually not
between lowercase letters where it takes its space within the inter-letter
spacing.
* It also discusses the allowed hyphenations and their correct rendering...
This archive was generated by hypermail 2.1.5 : Thu Apr 15 2004 - 22:03:04 EDT