From: Kent Karlsson (firstname.lastname@example.org)
Date: Tue Oct 29 2002 - 06:08:22 EST
> -----Original Message-----
> From: Marco Cimarosti [mailto:email@example.com]
> Sent: den 28 oktober 2002 16:23
> To: 'Kent Karlsson'; Marco Cimarosti
> Cc: firstname.lastname@example.org
> Subject: RE: Character identities
> Kent Karlsson wrote:
> > > > For this reason it is quite impermissible to render the
> > > > combining letter small e as a diaeresis
> > >
> > > So far so good. There would be no reason for doing such a thing.
> > ...
> > > > or, for that matter, the diaeresis as a combining
> > > > letter small e (however, you see the latter version
> > > > sometimes, very infrequently, in advertisement).
> > >
> > > This is the case I though we were discussing, and it is a
> > > very different case.
> > No, the claim was that diaresis and overscript e are the same,
> The claim was that dieresis and overscript e are the same in *modern*
> *standard* German. Or, better stated, that overscript e is
> just a glyph
> variant of dieresis, in *modern* *standard* German typeset in Fraktur.
Well, we strongly disagree about that then. Marc and I clearly see them
as different. More about this below.
> Sorry if I haven't stated this clearly enough.
You have several times. No need to emphasise it anymore. We still
> > Some of them (overscript e in particular) should be(come)
> > quite commonly occurring in any Fraktur Unicode font.
> "Commonly" sounds funny near "Fraktur"...
We were talkning about Fraktur fonts (which may not be all that
> > > Using such a character to encode 21st century advertisements
> > > is doomed to cause problems:
> > >
> > > 1) The glyph for U+0364 is more likely found in the font
> > > collection of the
> > > Faculty of Germanic Studies that on the PC of people wishing
> > > to read the
> > > advertisement for "Ye Olde Küster Pub". So, most people will
> > > be unable to
> > > view the advertisement correctly.
> > >
> > > 2) The designer of the advertisement will be unable to use
> > > his spell-checker and hyphenator on the advertisement's text.
> > Advertisements should invariably be final spell-checked and
> > hyphenated by humans! Automated spell checkers and hyphenators
> > for German (as well as Scandinavian languages) have (so far)
> > not been good enough even for running text that you want to
> > publish...
> This has no connection with this discussion.
Well, you brought it up. I'm usually rather picky about spelling,
so a spell checker can only suggest "corrections", often to be
rejected as wrong or even silly.
> However, IMHO, the presence U+0364 (COMBINING LATIN SMALL
> LETTER E) in a
> modern German or Swedish text is just a plain spelling error,
> and even the
> naivest spellchecker should flag it as such.
So what? Naïve spell checkers flag all kinds of correctly spelled
> > Most modern use of Fraktur seem to use diaeresis or double
> > acute for this.
> U+0308 (COMBINING DIAERESIS) should be the only "umlaut" to
> be found in
> modern German text. What that diacritic *looks* like (two
> dots, an "e", a
> double acute, a macron, Mickey Mouse's ears), is a choice of the font
Not quite. Please note that some characters are defined to have
very specific glyphs, e.g. the estimated sign, there is no shape
variability except for size. Others are "glyphically allocated/
unified", like the diacritics, and some glyphic variability is
expected. But a diaeresis is two dots (of some shape, and it would
be a margin case to have them elongated), never a tilde, macron
or overscript e. Those are other characters, not just a glyph
variation. Other characters have more glyphic variability
(informally) associated with them, like A, but some of them
have compatibility variants that have a somewhat more restricted
glyphic variability, like the Math Fraktur A in plane 1.
Some scripts have by tradition some very "strong" ligatures;
"strong" in the sense that may be hard to recognise the ligated
pieces in the result glyph. That does not mean that you can
legitimately use an M glyph for One Thousand C D, just because
they "mean" the same. Nor does that mean that diacritics can be
substituted for each other, asking for a diaeresis and get a tilde.
Yes, it is common practice with many to use a tilde instead of
a diaresis in handwriting, but it is still character substitution,
not a glyphic variant (since that is the way diacritics are
allocated in Unicode).
> > (But the web designer could use a dynamically
> > downloaded font fragment, if there is worry that all glyphs
> > might not be supported by the fonts used by the vast majority
> > of the target audience.)
> This too has no connection with this discussion, and is OT. Unicode is
> concerned with how text is *encoded* the details of fonts and display
> technology are out of scope.
We were talking about fonts.
> What Unicode really mandates is that the encoding should not change to
> obtain a certain graphic effect.
You can do any character mappings you like before you apply any
font, or make it into graphics...
> > And overscript small e will also vary with the font,
> > looking like a shrunken ordinary e glyph of (ideally) the same font.
> > But never like two dots (in the default mode of a Unicode font).
> You haven't yet defined your meaning of "Unicode font" and,
> now, you add a
> new fancy term: "default mode"!
> What's a "default mode"? Unicode does not require fonts to
> have any kind of
> "modes". You seem to be talking about the "features", which
> may exist in
> *some* font technologies (e.g., Open Type), and are not a
> requirement for
I was trying to be general (not fancy) and not just talk about
Opentype. But yes, I meant (at least) the case where no
"features" (or similar) are invoked. But to be more precise,
I would allow purely typographic "features" though, like
degree of ligation, lowercase digits (sometimes incorrectly
called "old form" digits), or different angles of acutes.
What I was aiming at excluding were "features" that implicitly
involve character mappings, like the "hist" someone mentioned,
or "smallcaps" (which *implicitly* involves a mapping to uppercase,
and then the use of x-height glyphs for the uppercase letters).
Or any "feature" ("hist"?) that map diaeresis to (say) overscript e.
(I know, there is no literal character mapping involved in AAT or
OT fonts, it either goes directly to glyph indices or maps glyph
indices to glyph indices, but the *net effect* is a font internal
characters to characters mapping.)
A font that by default (that is ordinary English, not a fancy
term) maps lowercase letters to uppercase (or smallcap) glyps,
is not a Unicode font (whatever the technology). If it by
special invocation ("features", "modes", call-it-whatever) does
an implicit (or explicit) character mapping, then that is what
it does: a character mapping paired with a mapping to glyphs.
Likewise for a font that (implicitly or explicitly) does other
character to character mappings (like diaeresis to overscript e)
should not do so by default ("in default mode") if they are
OT: Personally, I think it is a bad idea to try to make fonts
do (in effect) character mappings (e.g. lowercase to uppercase
for smallcaps). Those mappings, I think, should be done outside
of fonts. But the contrary seems to be in fashion for certain
mappings. They should not be done by default though.
> > > graphic designer to change the *encoding* of their text in
> > > order to get the desired result.
> > A graphic designer is likely to turn the whole thing into 2-d
> > or 3-d graphics, probably distorted, possibly animated, to get
> > the desired result! At which point the original, or intemediary,
> > encoding of any text elements is not very relevant to the
> > end result.
> We are not talking about printed text or picture containing
> text: we are
> talking about *electronic* text *encoded* in Unicode. Or else
> we are OT.
You were talking about "desired effect" in ads. That is often not
achievable without involving graphics... (You brought that up, not me!)
> Well, the first and only time I have seen that "Thousand C D"
> was on the
> Unicode charts... However, if I'd be asked which glyph is
> more appropriate
> for that character, I would say: the same as capital "M".
No, definitely not! They look very different, and I am sure
anyone (except you) using Thousand C D would never want it
displayed as an M. (If so, then you've done a character mapping!
Or perhaps you want to do a morphing ;-)
> > > The difference must be preserved when it
> > > is useful -- e.g., U+0308 should not look like U+0364 in a
> > "should not" --> "must never"
> OK. U+0308 must never look like U+0364 in a font designed for
> publishing books on the history of German.
Not only then.
> However, this is just a requirement of common sense, *not* of
> the Unicode Standard.
Everything about Unicode and fonts is about common sense.
It is very hard to make formal requirements in this area. So
all requirements on fonts are informal, and are not rigidly stated.
> Perhaps for an "Unicode font in default mode" all this true.
> You are the
> only person who knows, since you seem the inventor of this term...
It is plain English, Marco!
> "When rendered in the context of a language or script,
> like ordinary
> letters, combining marks may be subjected to systematic
> stylistic variation.
> [...] U+030C COMBINING CARON is normally rendered as an
> apostrophe when used
> with certain letterforms. U+0325 COMBINING COMMA BELOW is
> sometimes rendered
> as U+0312 COMBINING TURNED COMMA ABOVE [...]"
> ("7.9 Combining Marks", "Glyphic Variation", page 180)
These are particular forms of ligation, done for typographic
reasons. Not at all like the cases we were talking about.
(Replacing comma below by a cedilla (or the other way around)
is a bad idea though. Users (apparently) care!)
> "Each character in these code charts is shown with a
> glyph. ...
Yes. But that does not mean you can use arbitrary glyphs.
And, as I mentioned, some characters are glyphically more
constrained than others. This does not rule out decorative
or handwriting fonts in any way. But if you go to far,
you're loosing connection with Unicode. So if you make
a font where diaeresis is mapped to Micky Mouse ears, you
should not call it a Unicode font, even if you can apply
that font to Unicode text ("has a Unicode cmap" as it is
called in some font technologies). (Snowcaps are easier
to appreciate... ;-)
This archive was generated by hypermail 2.1.5 : Tue Oct 29 2002 - 06:53:21 EST