Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

From: Doug Ewell (dewell@adelphia.net)
Date: Sun Jul 13 2003 - 19:31:22 EDT

Next message: Mark Davis: "Re: ISO 639 "duplicate" codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)"

Previous message: Karlj�rgen Feuerherm: "Re: No UTF-8 in Eudora"
In reply to: Philippe Verdy: "Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)"
Next in thread: Michael Everson: "Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy <verdy_p at wanadoo dot fr> wrote:

> All this discussion shows that there is an extremely large number of
> glyph variation for the ampersand which is both (at the abstract
> level) a symbol character, and a ligature of two lowercase abstract
> characters. But ligatures for the uppercase "ET" and titlecase "Et"
> do exist as well. For Unicode, only the abstract symbol is encoded,
> but not the ligatures, despite they share a common set of glyphs.

That is one of the essential features of Unicode. Abstract characters
are encoded; glyph variants (in general) are not.

> Could the variant selectors may be used ? I see that Unicode
> does not allow a free use of variant selectors, which are defined
> only for cases where it would be important to preserve the
> precise semantic of the encoded text, but not as a way to
> preserve the glyphic information (so character variants are
> strictly limited).

That's correct. The difference between the Arial-style glyph that looks
a bit like a tilted treble clef (U+1D11E) and John's
epsilon-with-solidus and Philippe's e-with-small-attached-t is one of
style only. The distinction does not need to be encoded in plain text,
any more than the distinction between a lowercase g with one bowl versus
two.

Apparently the math experts really, really needed to make a distinction
in plain text between (e.g.) a less-than-or-equal sign with a horizontal
bottom stroke and one with a slanted bottom stroke. We can take it on
faith that that distinction is important in plain text, but we don't
need to add more distinctions that probably aren't.

> I don't see a solution for this "problem" within Unicode itself
> (and neither in ISO/IEC 10646), unless a separate standard
> is started to encode glyphs mapped to characters
> (in the UCS-4 space, out of its 17 first planes?). For now the
> safest way is to use specific fonts encoding these glyphs
> in PUA positions, and bind these fonts to the abstract text
> using stylesheets, meta information, or markup languages.
> But with such technic, the abstract text would be modified.
>
> A way to avoid it is to surround the text with markup that
> specifies an explicicit substitution, like this in XML:
>
> <typo as="">et</typo>,

You probably don't want to start down the slippery slope of encoding
Latin glyph variants as PUA characters. Check the archives of this
mailing list; you will find that proposals to use the PUA to turn
Unicode into a glyph registry are generally not well received.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Mark Davis: "Re: ISO 639 "duplicate" codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)"
Previous message: Karlj�rgen Feuerherm: "Re: No UTF-8 in Eudora"
In reply to: Philippe Verdy: "Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)"
Next in thread: Michael Everson: "Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Jul 13 2003 - 20:10:04 EDT