From: Philippe Verdy (firstname.lastname@example.org)
Date: Sun Jul 13 2003 - 04:51:48 EDT
On Sunday, July 13, 2003 7:21 AM, John Cowan <email@example.com> wrote:
> > Check common fonts like Trebuchet MS, Berkeley Book, Goudy Sans,
> > Korinna and Univers for recognizable _Et_ ampersands.
> I hand-write & by making a tall lower-case epsilon glyph and then
> drawing a solidus over it.
All this discussion shows that there is an extremely large number of
glyph variation for the ampersand which is both (at the abstract level)
a symbol character, and a ligature of two lowercase abstract
characters. But ligatures for the uppercase "ET" and titlecase "Et"
do exist as well. For Unicode, only the abstract symbol is encoded,
but not the ligatures, despite they share a common set of glyphs.
There are many other ligatures in the Latin script, extending up to
the handwritten script which is mostly made of ligatures for complete
words. Printing and publishing have just reduced since Gutenberg
their numbers as a way to simplify the reproduction of the
handwritten script using the old Classic forms from Roman Latin
and Classic Greek, where characters were rarely ligated.
In today's scripts, Arabic and Indic scripts have kept most of their
ligatures in the printed form, where it was probably considered to
preserve not only the semantic of the published text, but also its
artistical graphic form.
I won't then promote the encoding of ligatures for "et", "ET", "Et",
as this would be endless. We already have the necessary tools
in Unicode to represent in the abstract text where we need a
differenciation between the symbol use, and the ligature use
with ZWJ (if really needed), or with font renderer features where
these ligatures are automatically selected from available fonts.
However the choice of the ligatures to use for the letters "et" is
quite large, and I wonder how a font renderer would be able to
choose between them, without encoding somewhere in the
abstract text some styling information:
Could the variant selectors may be used ? I see that Unicode
does not allow a free use of variant selectors, which are defined
only for cases where it would be important to preserve the
precise semantic of the encoded text, but not as a way to
preserve the glyphic information (so character variants are
I don't see a solution for this "problem" within Unicode itself
(and neither in ISO/IEC 10646), unless a separate standard
is started to encode glyphs mapped to characters
(in the UCS-4 space, out of its 17 first planes?). For now the
safest way is to use specific fonts encoding these glyphs
in PUA positions, and bind these fonts to the abstract text
using stylesheets, meta information, or markup languages.
But with such technic, the abstract text would be modified.
A way to avoid it is to surround the text with markup that
specifies an explicicit substitution, like this in XML:
One can usea better technic with stylesheets, if multiple
fonts are used and the glyph substitutions must be made
under the control of a stylesheet, like this in XHTML:
or by using separately-defined SGML named entities:
(this way, we don't need to specify the PUA codepoint
value which may vary across available fonts)
-- Philippe. Spams non tolérés: tout message non sollicité sera rapporté à vos fournisseurs de services Internet.
This archive was generated by hypermail 2.1.5 : Sun Jul 13 2003 - 05:29:50 EDT