RE: Character identities

From: Kent Karlsson (kentk@md.chalmers.se)
Date: Thu Oct 31 2002 - 09:03:44 EST

Next message: Doug Ewell: "[OT] Göthe (was: Re: RE: Character identities)"

Previous message: Dominikus Scherkl: "RE: New Charakter Proposal"
In reply to: Marco Cimarosti: "RE: Character identities"
Next in thread: starner@okstate.edu: "Re: RE: Character identities"
Maybe reply: starner@okstate.edu: "Re: RE: Character identities"
Maybe reply: starner@okstate.edu: "Re: RE: Character identities"
Maybe reply: Jim Allan: "Re: RE: Character identities"
Maybe reply: Marco Cimarosti: "RE: RE: Character identities"
Maybe reply: Marco Cimarosti: "RE: RE: Character identities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Let me take a few comparable examples;

1. Some (I think font makers) a few years ago argued
   that the Lithuanian i-dot-circumflex was just a
   glyph variant (Lithuanian specific) of i-circumflex,
   and a few other similar characters.

   Still, the Unicode standard now does not regard those as
   glyph variants (anymore, if it ever did), and embodies
   that the Lithuanian i-dot-circumflex is a different
   character in its casing rules (see SpecialCasing.txt).
   There are special rules for inserting (when lowercasing)
   or removing (when uppercasing) dot-aboves on i-s and I-s
   for Lithuanian. I can only conclude that it would be
   wrong even for a Lithuanian specific font to display an
   i-circumflex character as an i-dot-circumflex glyph,
   even though an i-circumflex glyph is never used for
   Lithuanian.

2. The Khmer script got allocated a "KHMER SIGN BEYYAL".
   It stands (stood...) for "any abbreviation of the
   Khmer correspondence to etc."; there are at least four
   different abbreviations, much like "etc", "etc.", "&c",
   "et c.", ... It would be up to the font maker to decide
   exactly which abbreviation, and would vary by font.

   However, it is now targeted for deprecation for precisely
   that reason: it is *not* the font (maker) that should
   decide which abbreviation convention to use in a document,
   it is the *"author"* of the document who should decide.
   Just as for the Latin script, the author decides how to
   abbreviate "et cetera". The way of abbreviating should stay
   the same *regardless of font*. Note that the font may be
   chosen at a much later time, and not for wanting to
   change abbreviation convention. That convention one
   may want to have the same throughout a document also
   when using several different fonts in it, not having to
   carefully consider abbreviation conventions when choosing
   fonts.

3. Marco would even allow (by default; I cannot get away
   from that caveat since some (not all) font technologies
   do what they do) displaying the ROMAN NUMERAL ONE THOUSAND
   C D (U+2180) as an M, and it would be up to the font
   designer. While the glyphs are informative, this glyphic
   substitution definitely goes too far. If the author
   chose to use U+2180, a glyph having at least some
   similarity to the sample glyph should be shown, unless
   and until someone makes a (permanent or transient)
   explicit character change.

4. Some people write è instead of é (I claim they cannot
   spell...). So is it up to a font designer to display
   é as è if the font is made for a context where many
   people does not make a distinction? Can a correctly
   spelled name (say) be turned into an apparent misspelling
   by just choosing such a font? And that would be a Unicode
   font?

5. I can't leave the ö vs. ø; these are just different
   ways of writing "the same" letter; and it is not
   the case that ø is used instead of ö for any
   7-bit reasons. It is conventional to use ø for ö
   in Norway and Denmark for any Swedish name (or
   word) containing it. The same goes for ä vs. æ.
   Why shouldn't this one be up to the font makers too?
   If the font is made purely for Norwegian, why not
   display ö as ø, as is the convention? This is
   *exactly* the same situation as with ä vs. a^e.

I say, let the *"author"* decide in all these cases, and
let that decision stand, *regardless of font changes*.
[There is an implicit qualification there, but I'm
tired of writing it.]

> Kent Karlsson wrote:
> > > I insist that you can talk about character-to-character
> > > mappings only when
> > > the so-called "backing store" is affected in some way.
> >
> > No, why? It is perfectly permissible to do the equivalent
> > of "print(to_upper(mystring))" without changing the backing
> > store ("mystring" in the pseudocode); to_upper here would
> > return a NEW string without changing the argument.
>
> And that, conceptually, is a character-to-glyph mapping.

Now I have lost you. How can it be that? The "print"
part, yes. But not the to_upper part; that is a
character-to-character mapping, inserted between the
"backing store" and "mapping characters to glyphs".
It is still an (apparent) character-to-character
mapping even if it is not stored in the "backing store".

> In my mind, you are so much into the OpenType architecture,
> and so much used
> to the concept that glyphization is what a font "does", that
> you can't view the big picture.

Now I have lost you again. Some fonts (in some font
technologies) do more that "pure" glyphization. This
is why I have been putting in caveats, since many people
seem to think that all fonts *only* do glyphisation,
which is not the case.

But to be general I was referring to such mappings regardless
of if that is built into some font (using character code points
or, as in OT/AAT, using glyph indices) or (better) were external
to the font.

I was trying to use general formulations, but I cannot
avoid having caveats for certain mappings that certain
technologies do (since those are so popular). But I would
agree that those particular forms of mappings *should not*
be done by fonts (but they are), and instead be done
externally of the fonts (even when transient, as part
of the "rendering"). An advantage would be that if
a particular (named) mapping was asked for (to_upper say),
it would be done the same way regardless of which font
is chosen. But alas...

Kind regards
/kent k

Next message: Doug Ewell: "[OT] Göthe (was: Re: RE: Character identities)"
Previous message: Dominikus Scherkl: "RE: New Charakter Proposal"
In reply to: Marco Cimarosti: "RE: Character identities"
Next in thread: starner@okstate.edu: "Re: RE: Character identities"
Maybe reply: starner@okstate.edu: "Re: RE: Character identities"
Maybe reply: starner@okstate.edu: "Re: RE: Character identities"
Maybe reply: Jim Allan: "Re: RE: Character identities"
Maybe reply: Marco Cimarosti: "RE: RE: Character identities"
Maybe reply: Marco Cimarosti: "RE: RE: Character identities"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Oct 31 2002 - 09:57:02 EST