RE: Character identities

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Oct 28 2002 - 10:23:19 EST

  • Next message: Doug Ewell: "Re: Character identities"

    Kent Karlsson wrote:
    > > > For this reason it is quite impermissible to render the
    > > > combining letter small e as a diaeresis
    > >
    > > So far so good. There would be no reason for doing such a thing.
    > ...
    > > > or, for that matter, the diaeresis as a combining
    > > > letter small e (however, you see the latter version
    > > > sometimes, very infrequently, in advertisement).
    > >
    > > This is the case I though we were discussing, and it is a
    > > very different case.
    >
    > No, the claim was that diaresis and overscript e are the same,

    The claim was that dieresis and overscript e are the same in *modern*
    *standard* German. Or, better stated, that overscript e is just a glyph
    variant of dieresis, in *modern* *standard* German typeset in Fraktur.

    Sorry if I haven't stated this clearly enough.

    > so the reversed case Marc is talking about is not different at all.

    It is. In the first case, we are talking about a glyph variant in *modern*
    *standard* German, in the second case, we are talking about two different
    diacritics in some *other* context. (Ancient German? ancient Swedish?).

    > > Standing Keld's opinion and Marc's wholehearted support, it
    >
    > Please don't confuse me with Keld!

    Oooops! My apologies!

    > > follows that
    > > those infrequent advertisements should be encoded using U+0364...
    > >
    > > But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a
    > > small collection of
    > > "Medieval superscript letter diactrics", which is supposed
    > to "appear
    > > primarily in medieval Germanic manuscripts", or to reproduce
    > > "some usage as late as the 19th century in some languages".
    >
    > Yes, but you should not read too much into the explanation,
    > which, while correct, does not limit the existence of their
    > glyphs to fonts used only by germanic professors...
    > Some of them (overscript e in particular) should be(come)
    > quite commonly occurring in any Fraktur Unicode font.

    "Commonly" sounds funny near "Fraktur"...

    > > Using such a character to encode 21st century advertisements
    > > is doomed to cause problems:
    > >
    > > 1) The glyph for U+0364 is more likely found in the font
    > > collection of the
    > > Faculty of Germanic Studies that on the PC of people wishing
    > > to read the
    > > advertisement for "Ye Olde Küster Pub". So, most people will
    > > be unable to
    > > view the advertisement correctly.
    > >
    > > 2) The designer of the advertisement will be unable to use
    > > his spell-checker and hyphenator on the advertisement's text.
    >
    > Advertisements should invariably be final spell-checked and
    > hyphenated by humans! Automated spell checkers and hyphenators
    > for German (as well as Scandinavian languages) have (so far)
    > not been good enough even for running text that you want to
    > publish...

    This has no connection with this discussion.

    However, IMHO, the presence U+0364 (COMBINING LATIN SMALL LETTER E) in a
    modern German or Swedish text is just a plain spelling error, and even the
    naivest spellchecker should flag it as such.

    > > 3) User's will be unable to find the Küster Pub by searching
    > > "Küster" in a
    > > search engine.
    >
    > Depends on the search engine, and if it uses a correct collation
    > table (for the language) or not...
    >
    > > What will actually happen is that everybody will see an empty
    > > square, so
    > > they'll think that the web designer is an idiot, apart the
    > > professors at the
    > > Faculty of Germanic Studies, who'll think that the designer
    > > is an idiot
    > > because she doesn't know the difference between U+0308 and
    > > U+0364 in ancient German.
    >
    > Most modern use of Fraktur seem to use diaeresis or double
    > acute for this.

    U+0308 (COMBINING DIAERESIS) should be the only "umlaut" to be found in
    modern German text. What that diacritic *looks* like (two dots, an "e", a
    double acute, a macron, Mickey Mouse's ears), is a choice of the font
    designer.

    > (But the web designer could use a dynamically
    > downloaded font fragment, if there is worry that all glyphs
    > might not be supported by the fonts used by the vast majority
    > of the target audience.)

    This too has no connection with this discussion, and is OT. Unicode is
    concerned with how text is *encoded* the details of fonts and display
    technology are out of scope.

    What Unicode really mandates is that the encoding should not change to
    obtain a certain graphic effect.

    > > The real error (IMHO) is the idea that font designers should
    > > stick to the
    > > *sample* glyphs printed on the Unicode book, because this
    > would force
    >
    > Well, the diacritics are allocated/unified on glyphic grounds.
    > While a diaeresis may look different from font to font, it is
    > basically two "dots" (of some shape in line with the design of the
    > font), never an "e" shape. At least not in the *default mode* of a
    > *Unicode font*.
    >
    > And overscript small e will also vary with the font,
    > looking like a shrunken ordinary e glyph of (ideally) the same font.
    > But never like two dots (in the default mode of a Unicode font).

    You haven't yet defined your meaning of "Unicode font" and, now, you add a
    new fancy term: "default mode"!

    What's a "default mode"? Unicode does not require fonts to have any kind of
    "modes". You seem to be talking about the "features", which may exist in
    *some* font technologies (e.g., Open Type), and are not a requirement for
    Unicode.

    > > graphic designer to change the *encoding* of their text in
    > > order to get the desired result.
    >
    > A graphic designer is likely to turn the whole thing into 2-d
    > or 3-d graphics, probably distorted, possibly animated, to get
    > the desired result! At which point the original, or intemediary,
    > encoding of any text elements is not very relevant to the
    > end result.

    We are not talking about printed text or picture containing text: we are
    talking about *electronic* text *encoded* in Unicode. Or else we are OT.

    > > Another big error (IMHO, once again) is the idea that two
    > > different Unicode characters should look different.
    >
    > I have never said that! E.g., a µ as well as an Ċ (both of which
    > are allocated twice!) should look the same (resp.) regardless of
    > which of their respective code points is used. There are many
    > more examples of characters that definitely should (e.g. capital
    > K and Kelvin sign, small i and small roman numeral one) or may
    > (capital A, capital Alpha, ...) look the same.
    >
    > There are also lots of characters that "mean" the same, but
    > always (in a Unicode font in default mode) should/must
    > look different. Like M and Roman Numeral One Thousand C D
    > (just to take an example closer to Italy... ;-).

    Well, the first and only time I have seen that "Thousand C D" was on the
    Unicode charts... However, if I'd be asked which glyph is more appropriate
    for that character, I would say: the same as capital "M".

    > > The difference must be preserved when it
    > > is useful -- e.g., U+0308 should not look like U+0364 in a
    >
    > "should not" --> "must never"

    OK. U+0308 must never look like U+0364 in a font designed for publishing
    books on the history of German.

    However, this is just a requirement of common sense, *not* of the Unicode
    Standard.

    > > font designed for
    > > publishing books on the history of German!
    >
    > "a font ....." --> "any Unicode font in default mode"
    >
    > (Bad example, Marco!)
    >
    > >
    > > What should really happen, IMHO, is that modern German should
    > > be encoded as
    > > modern German. A U+0308 (COMBINING DIAERESIS) should remain
    > a U+0308,
    > > regardless that the corresponding glyph *looks* like U+0364
    > > (COMBINING LATIN
    > > SMALL LETTER E) in one font, and it looks like U+0304
    > > (COMBINING MACRON) in
    > > another font, and it looks like two five-pointed start
    > > side-by-side in a
    > > third font, and it looks like Mickey Mouse's ears in <Disney.ttf>...
    >
    > These are all unacceptable variations in a *Unicode font (in
    > default mode)*. But you can have all kinds of silly variations
    > in *non*-Unicode fonts applied to Unicode text, including ciphers
    > or rebuses... (ok, there are degrees...)

    Perhaps for an "Unicode font in default mode" all this true. You are the
    only person who knows, since you seem the inventor of this term...

    As for the Unicode standard, what is or is not acceptable is stated on the
    Unicode book, which reads:

            "Glyph shape [...] are the responsibility of individual font vendors
    and of appropriate standards and are not part of the Unicode Standard."
    ("2.2 Unicode Design Principles", "Character, Not Glyphs", page 13)

            "When rendered in the context of a language or script, like ordinary
    letters, combining marks may be subjected to systematic stylistic variation.
    [...] U+030C COMBINING CARON is normally rendered as an apostrophe when used
    with certain letterforms. U+0325 COMBINING COMMA BELOW is sometimes rendered
    as U+0312 COMBINING TURNED COMMA ABOVE [...]"
    ("7.9 Combining Marks", "Glyphic Variation", page 180)

            "Each character in these code charts is shown with a representative
    glyph. A representative glyph is not a prescriptive form of the character,
    but one that enables recognition of the intended character [...] In many
    cases, there are more or less well-established alternative glyphic
    representation for the same character. Designers of high-quality fonts will
    do their own research into the preferred glyphic appearance of Unicode
    characters."
    ("14.1 Character Name Lists", "Images in the Code Charts and Character
    Lists", page 332)

    _ Marco



    This archive was generated by hypermail 2.1.5 : Mon Oct 28 2002 - 11:14:21 EST