RE: Character identities

From: Marco Cimarosti (
Date: Tue Oct 29 2002 - 15:07:16 EST

  • Next message: "Re: Character identities"

    Kent Karlsson wrote:
    > Marco,

    Keld, please allow me to begin with the end of your post:

    > Marco, please calm down and reread every sentence of my
    > previous message. You seem to have misread quite a few things,
    > but it is better you reread calmly before I try to clear
    > up any remaining misunderstandings.

    I have been absolutely calm, and I apologize if I gave a different
    impression. I may happen to heat up when discussing things like ethics,
    politics, religions, racism, war, etc., but definitely not when discussing
    about the details of the Unicode character-glyph model.

    I wish to recall that we are just discussing about a glyph variation for a
    diacritic character: a variation that I consider acceptable and you consider
    undesirable. Please let's not make this bigger than it could reasonably be.

    > Standard orthography, and orthography that someone may
    > choose to use on a sign, or in handwriting, are often not
    > the same.
    > And I did say that current font technologies (e.g. OT)
    > does not actually do character to character mappings,
    > but the net effect is *as if* they did (if, and I hope
    > only if, certain "features" are invoked, like "smallcaps").
    > It would be more honest to do them as character-to-character
    > mappings though, either inside (which OT does not support)
    > or outside of the font. Capital A, even at x-height, is not
    > a glyph variant of small a (even though, centuries ago, that
    > was the case, but then I and J were the same, and U and V,
    > et and &, ad and @, ...). But displaying U as V (in effect
    > doing a character replacement on a copy of the input) would
    > be ok in a non-default mode (using the "hist" feature, say).

    I insist that you can talk about character-to-character mappings only when
    the so-called "backing store" is affected in some way. If the backing store
    is not changed, it is only a character-to-glyph mapping, however complicate
    and indirect it may be.

    Whether these mappings takes part inside or outside a font is irrelevant as
    far, again, as the backing store is not changed.

    > My point here is that that replacement (effectively) should
    > not be done by default in a Unicode font (see Doug's explanation
    > for what a Unicode font is, if you don't like mine).

    I totally agree with Doug's careful definition, and I am glad that you agree
    as well.

    Doug indicates two key points that a font must respect to be suitable for

    « [...] calling a font a "Unicode font" implies two things:
    1. It must be based on Unicode code points. [...]
    2. The glyphs must reflect the "essential characteristics" of the Unicode
    character to which they are mapped. [...] »

    If we agree that the only requirement for a glyph representing a certain
    Unicode character is to respect the "essential characteristics" which make
    it recognizable, then all our discussion is simply about determining which
    "essential characteristics" a particular character is supposed to have.

    To me, a glyph floating atop of letters "a", "o" and "u" is recognizably a
    German umlaut if (a) the text is written in German, and (b) the glyph has
    one of the following shapes:

    1. Two small "blobs" (e.g. circles, squares, acute accents) places side by
    2. A straight horizontal line;
    3. A wavy horizontal line;
    4. a small lowercase "e", or something recalling it.

    I don't argue this for caprice or provocation, but because these particular
    shapes are commonly attested in one context or another: be it modern
    typography, traditional typography, handwriting, fancy graphics, etc.

    You seem to argue that only case 1 is acceptable, and probably also add some
    constraints on the shape of the "blobs" (e.g., I think I understood that you
    find that a double acute shape would be unacceptable).

    As I see it, the only reason for which you say this is because the other
    shapes are similar or identical to the typical shapes of other Unicode
    characters. As I said, I don't find that this is valid reason, unless the
    font we are talking about is to be used in contexts (e.g., linguistics, or
    languages other than German) in which the distinction is meaningful.

    > > [...] I never heard that U+0364
    > > (COMBINING LATIN SMALL LETTER E) is part of the spelling of
    > > modern German or Swedish.
    > True (that is not part of modern standard orthography),
    > but I don't see how that could imply some kind of support
    > for your (rather surprising and extreme) position.

    (Frankly, I find surprising and extreme your position -- perhaps we're only
    choosing bad examples.)

    What I meant is that if (a) U+0364 is not supposed to appear in modern
    German, and (b) the font we are considering is designed to be used for
    modern German only, then (c) the possibility of confusing U+0364 with U+0308
    is a non issue.

    > If (and only if!) the author/editor of the text asks for an
    > overscript e should the font produce one. It is not up to
    > the font maker to make such substitutions without request,

    Yes. But a font which displays U+0308 with a glyph resembling the typical
    glyph for U+0364 is not "producing" anything; it is not "substituting"
    anything with anything else: it is just faithfully reproducing the text,
    according to the content decided by the author *and* according to the
    typographical style decided by the font designer.

    > > More *symbol* characters which escape the general rule.
    > Math Fraktur A is a letter (of course!). Many letters,
    > including ordinary A, are used as symbols too.
    > You seem to argue that for "symbols" (whichever those are,
    > I'm sure you *don't* mean general categories S*...) there is
    > total rigidity, while for "non-symbols" (whichever those are)
    > there is near total anarchy and font makers can change glyphs
    > to something entirely different.

    No, I was not referring to general categories (which, by the way, should
    have better been "Sm" for the math extensions, IMHO).

    While an ordinary "A" is intended for any kind of usage (including as a
    symbol), an "A" from the Math extensions is intended for usage as a symbols
    inside math formulas, where the typographical style conveys a precise
    meaning, rather that just being an esthetic choice.

    For these math characters the font designer's freedom is unusually narrow,
    because these character *must* look like Fraktur, sans-serif, monospaced,
    bold, etc., in order to be useful.

    So, what I meant is that these are poor examples of the level of stylistic
    freedom which is *normally* granted to type designers.

    > I claim that there are no characters for which there is total
    > anarchy (except possibly for "view invisibles" of normally
    > invisible characters),

    (It depends what you mean by "total anarchy". To me, standards are a form of
    self-discipline which might a possible basis for an orderly anarchy. But I
    guess that this would lead us way way way off the topic...)

    > but that there are several degrees
    > of flexibility (I'm sure someone can list more than three,
    > but here is a coarse division):
    > 1. glyph (almost) fixed: Dingbats, estimated sign, ...
    > [could possibly be given a rugged look, or texture
    > if you want to mimic e.g. a typewriter look]


    > 2. "abstract" glyph is fixed but there can be minor
    > shape variations: diacritics, math symbols (Sm),
    > "math letters" (there are several Math Fraktur
    > designs, several Math sans-serif designs, etc.
    > that could suit), Arabic presentation forms (initial/
    > medial/final/isolated decided but other aspects are
    > not fixed, maybe this case is between 2 and 3), ...

    An "abstract glyph" is a very interesting concept, and I have been chasing
    it till I first heard about Unicode... But I am starting to think that it
    can be a chimera.

    For the time being, I would rather stick to Doug's more generic "essential

    > 3. fairly free as long as (some) readers recognise
    > the character from the glyph (modulo compatibility/
    > canonical variants and what should have been
    > compatibility/canonical variants...): "nominal"
    > digits/letters/punctuation, ... [This, however,
    > does NOT allow, e.g., the One Thousand C D character
    > to be shown with an M glyph, nor display € as EUR, ...
    > in a Unicode font in...; if it did so in default mode
    > ["by default"], it would not be a Unicode font.]

    And this is the area where discussions are more likely to occur, and the
    area where I think we are with this discussion.

    The fact is that, out of the special cases 1 and 2 above, "essential
    characteristics" are not always, not necessarily *visual* characteristics.
    In several cases, the *semantics* of the character may play a role even
    bigger. IMHO.

    > [4. Near anarchy; you seem to argue that a large part of
    > case 2 and all of case 3 fall here...]

    Parse error on "Near anarchy". Anarchy is not something we need to discuss
    in this century.

    > Yes, you can have glyphic variation, but for the diacritics
    > there is (by design, but maybe not sufficiently explicit
    > stated in the book) a limit to how much it can vary ("in
    > default mode"). There are limits also for, e.g., 'nominal'
    > letters and roman numeral characters, that are (by design)
    > somewhat less constrained. In addition you may note that
    > those who asked for the inclusion of overscript e does not
    > regard an overscript e glyph to be an acceptable way
    > of displaying a diaeresis [in a Uni..., you know].

    I guess that those who need a distinct overscript "e" probably do not
    typeset their academic works using fantasy fonts, do they?

    > These things come up quite often in discussions about
    > proposals to add characters, even though it is not formally
    > stated. If some of the "Unicode elders" care to elaborate,
    > please feel free.

    Yes, please. Keld and I are starting to be repetitive...

    > Marco, I'm not sure it is of any use to try to explain in
    > more detail, since you don't appear to be listening.

    I listened to everything you've been saying, and I found that a lot of it
    was very sensible. A lot, but not all of it.

    > However,
    > I think I, Marc, Doug, and Mark (at the very least) seem
    > to be in approximate agreement on this (at least, I have
    > yet to see any major disagreement).

    Unless I missed something, Mark (Davis) did not express his position about
    this (not on this public list, anyway), and Doug said something very
    equilibrated on which we both agree.

    > I'm sure Michael would agree too (at least I hope so), and many others.

    There are many Michaels and many "others" here... If any of them wish to
    intervene, I hope they'll rather say something new to take the discussion
    out of the loop, rather than joining one faction.


    _ Marco

    This archive was generated by hypermail 2.1.5 : Tue Oct 29 2002 - 16:10:53 EST