From: Doug Ewell (dewell@adelphia.net)
Date: Sun Jul 13 2003 - 19:31:22 EDT
Philippe Verdy <verdy_p at wanadoo dot fr> wrote:
> All this discussion shows that there is an extremely large number of
> glyph variation for the ampersand which is both (at the abstract
> level) a symbol character, and a ligature of two lowercase abstract
> characters. But ligatures for the uppercase "ET" and titlecase "Et"
> do exist as well. For Unicode, only the abstract symbol is encoded,
> but not the ligatures, despite they share a common set of glyphs.
That is one of the essential features of Unicode. Abstract characters
are encoded; glyph variants (in general) are not.
> Could the variant selectors may be used ? I see that Unicode
> does not allow a free use of variant selectors, which are defined
> only for cases where it would be important to preserve the
> precise semantic of the encoded text, but not as a way to
> preserve the glyphic information (so character variants are
> strictly limited).
That's correct. The difference between the Arial-style glyph that looks
a bit like a tilted treble clef (U+1D11E) and John's
epsilon-with-solidus and Philippe's e-with-small-attached-t is one of
style only. The distinction does not need to be encoded in plain text,
any more than the distinction between a lowercase g with one bowl versus
two.
Apparently the math experts really, really needed to make a distinction
in plain text between (e.g.) a less-than-or-equal sign with a horizontal
bottom stroke and one with a slanted bottom stroke. We can take it on
faith that that distinction is important in plain text, but we don't
need to add more distinctions that probably aren't.
> I don't see a solution for this "problem" within Unicode itself
> (and neither in ISO/IEC 10646), unless a separate standard
> is started to encode glyphs mapped to characters
> (in the UCS-4 space, out of its 17 first planes?). For now the
> safest way is to use specific fonts encoding these glyphs
> in PUA positions, and bind these fonts to the abstract text
> using stylesheets, meta information, or markup languages.
> But with such technic, the abstract text would be modified.
>
> A way to avoid it is to surround the text with markup that
> specifies an explicicit substitution, like this in XML:
>
> <typo as="">et</typo>,
You probably don't want to start down the slippery slope of encoding
Latin glyph variants as PUA characters. Check the archives of this
mailing list; you will find that proposals to use the PUA to turn
Unicode into a glyph registry are generally not well received.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Sun Jul 13 2003 - 20:10:04 EDT