RE: Character identities

From: Kent Karlsson (
Date: Mon Oct 28 2002 - 05:21:30 EST

  • Next message: David Starner: "Re: Character identities"

    > > For this reason it is quite impermissible to render the
    > > combining letter small e as a diaeresis
    > So far so good. There would be no reason for doing such a thing.
    > > or, for that matter, the diaeresis as a combining
    > > letter small e (however, you see the latter version
    > > sometimes, very infrequently, in advertisement).
    > This is the case I though we were discussing, and it is a
    > very different case.

    No, the claim was that diaresis and overscript e are the same,
    so the reversed case Marc is talking about is not different at all.

    > Standing Keld's opinion and Marc's wholehearted support, it

    Please don't confuse me with Keld!

    > follows that
    > those infrequent advertisements should be encoded using U+0364...
    > But U+0364 (COMBINING LATIN SMALL LETTER E) belongs to a
    > small collection of
    > "Medieval superscript letter diactrics", which is supposed to "appear
    > primarily in medieval Germanic manuscripts", or to reproduce
    > "some usage as late as the 19th century in some languages".

    Yes, but you should not read too much into the explanation,
    which, while correct, does not limit the existence of their
    glyphs to fonts used only by germanic professors...
    Some of them (overscript e in particular) should be(come)
    quite commonly occurring in any Fraktur Unicode font.

    > Using such a character to encode 21st century advertisements
    > is doomed to cause problems:
    > 1) The glyph for U+0364 is more likely found in the font
    > collection of the
    > Faculty of Germanic Studies that on the PC of people wishing
    > to read the
    > advertisement for "Ye Olde Küster Pub". So, most people will
    > be unable to
    > view the advertisement correctly.
    > 2) The designer of the advertisement will be unable to use
    > his spell-checker and hyphenator on the advertisement's text.

    Advertisements should invariably be final spell-checked and
    hyphenated by humans! Automated spell checkers and hyphenators
    for German (as well as Scandinavian languages) have (so far)
    not been good enough even for running text that you want to

    > 3) User's will be unable to find the Küster Pub by searching
    > "Küster" in a
    > search engine.

    Depends on the search engine, and if it uses a correct collation
    table (for the language) or not...

    > What will actually happen is that everybody will see an empty
    > square, so
    > they'll think that the web designer is an idiot, apart the
    > professors at the
    > Faculty of Germanic Studies, who'll think that the designer
    > is an idiot
    > because she doesn't know the difference between U+0308 and
    > U+0364 in ancient German.

    Most modern use of Fraktur seem to use diaeresis or double
    acute for this. (But the web designer could use a dynamically
    downloaded font fragment, if there is worry that all glyphs
    might not be supported by the fonts used by the vast majority
    of the target audience.)

    > The real error (IMHO) is the idea that font designers should
    > stick to the
    > *sample* glyphs printed on the Unicode book, because this would force

    Well, the diacritics are allocated/unified on glyphic grounds.
    While a diaeresis may look different from font to font, it is
    basically two "dots" (of some shape in line with the design of the
    font), never an "e" shape. At least not in the *default mode* of a
    *Unicode font*. And overscript small e will also vary with the font,
    looking like a shrunken ordinary e glyph of (ideally) the same font.
    But never like two dots (in the default mode of a Unicode font).

    > graphic designer to change the *encoding* of their text in
    > order to get the desired result.

    A graphic designer is likely to turn the whole thing into 2-d
    or 3-d graphics, probably distorted, possibly animated, to get
    the desired result! At which point the original, or intemediary,
    encoding of any text elements is not very relevant to the
    end result.

    > Another big error (IMHO, once again) is the idea that two
    > different Unicode characters should look different.

    I have never said that! E.g., a µ as well as an Ċ (both of which
    are allocated twice!) should look the same (resp.) regardless of
    which of their respective code points is used. There are many
    more examples of characters that definitely should (e.g. capital
    K and Kelvin sign, small i and small roman numeral one) or may
    (capital A, capital Alpha, ...) look the same.

    There are also lots of characters that "mean" the same, but
    always (in a Unicode font in default mode) should/must
    look different. Like M and Roman Numeral One Thousand C D
    (just to take an example closer to Italy... ;-).

    > The difference must be preserved when it
    > is useful -- e.g., U+0308 should not look like U+0364 in a

    "should not" --> "must never"

    > font designed for
    > publishing books on the history of German!

    "a font ....." --> "any Unicode font in default mode"

    (Bad example, Marco!)

    > What should really happen, IMHO, is that modern German should
    > be encoded as
    > modern German. A U+0308 (COMBINING DIAERESIS) should remain a U+0308,
    > regardless that the corresponding glyph *looks* like U+0364
    > SMALL LETTER E) in one font, and it looks like U+0304
    > another font, and it looks like two five-pointed start
    > side-by-side in a
    > third font, and it looks like Mickey Mouse's ears in <Disney.ttf>...

    These are all unacceptable variations in a *Unicode font (in
    default mode)*. But you can have all kinds of silly variations
    in *non*-Unicode fonts applied to Unicode text, including ciphers
    or rebuses... (ok, there are degrees...)

                    /Kent K

    > _ Marco

    This archive was generated by hypermail 2.1.5 : Mon Oct 28 2002 - 06:05:11 EST