RE: Character identities

From: Kent Karlsson (
Date: Tue Oct 29 2002 - 12:40:03 EST

  • Next message: John Hudson: "Re: Unicode plane 14 language tags."


       Standard orthography, and orthography that someone may
    choose to use on a sign, or in handwriting, are often not
    the same.

       And I did say that current font technologies (e.g. OT)
    does not actually do character to character mappings,
    but the net effect is *as if* they did (if, and I hope
    only if, certain "features" are invoked, like "smallcaps").
    It would be more honest to do them as character-to-character
    mappings though, either inside (which OT does not support)
    or outside of the font. Capital A, even at x-height, is not
    a glyph variant of small a (even though, centuries ago, that
    was the case, but then I and J were the same, and U and V,
    et and &, ad and @, ...). But displaying U as V (in effect
    doing a character replacement on a copy of the input) would
    be ok in a non-default mode (using the "hist" feature, say).
    My point here is that that replacement (effectively) should
    not be done by default in a Unicode font (see Doug's explanation
    for what a Unicode font is, if you don't like mine).

    > [...] I never heard that U+0364
    > (COMBINING LATIN SMALL LETTER E) is part of the spelling of
    > modern German or Swedish.

       True (that is not part of modern standard orthography),
    but I don't see how that could imply some kind of support
    for your (rather surprising and extreme) position.

       If (and only if!) the author/editor of the text asks for an
    overscript e should the font produce one. It is not up to
    the font maker to make such substitutions without request,
    either by the author/(human) editor changing the text, or by
    the author/editor invoking a non-default font "feature"
    (via some higher-level protocol, can't be done in plain text).
    The "default mode" (for lack of a better term) would be the
    one used, well, by default; e.g. on plain text.

    > > Other characters have more glyphic variability
    > > (informally) associated with them, like A, but some of them
    > > have compatibility variants that have a somewhat more restricted
    > > glyphic variability, like the Math Fraktur A in plane 1.
    > More *symbol* characters which escape the general rule.

       Math Fraktur A is a letter (of course!). Many letters,
    including ordinary A, are used as symbols too.

    You seem to argue that for "symbols" (whichever those are,
    I'm sure you *don't* mean general categories S*...) there is
    total rigidity, while for "non-symbols" (whichever those are)
    there is near total anarchy and font makers can change glyphs
    to something entirely different.

       I claim that there are no characters for which there is total
    anarchy (except possibly for "view invisibles" of normally
    invisible characters), but that there are several degrees
    of flexibility (I'm sure someone can list more than three,
    but here is a coarse division):

            1. glyph (almost) fixed: Dingbats, estimated sign, ...
               [could possibly be given a rugged look, or texture
               if you want to mimic e.g. a typewriter look]

            2. "abstract" glyph is fixed but there can be minor
               shape variations: diacritics, math symbols (Sm),
               "math letters" (there are several Math Fraktur
               designs, several Math sans-serif designs, etc.
               that could suit), Arabic presentation forms (initial/
               medial/final/isolated decided but other aspects are
               not fixed, maybe this case is between 2 and 3), ...

            3. fairly free as long as (some) readers recognise
               the character from the glyph (modulo compatibility/
               canonical variants and what should have been
               compatibility/canonical variants...): "nominal"
               digits/letters/punctuation, ... [This, however,
               does NOT allow, e.g., the One Thousand C D character
               to be shown with an M glyph, nor display € as EUR, ...
               in a Unicode font in...; if it did so in default mode
               ["by default"], it would not be a Unicode font.]

            [4. Near anarchy; you seem to argue that a large part of
                case 2 and all of case 3 fall here...]

    Yes, you can have glyphic variation, but for the diacritics
    there is (by design, but maybe not sufficiently explicit
    stated in the book) a limit to how much it can vary ("in
    default mode"). There are limits also for, e.g., 'nominal'
    letters and roman numeral characters, that are (by design)
    somewhat less constrained. In addition you may note that
    those who asked for the inclusion of overscript e does not
    regard an overscript e glyph to be an acceptable way
    of displaying a diaeresis [in a Uni..., you know].

       These things come up quite often in discussions about
    proposals to add characters, even though it is not formally
    stated. If some of the "Unicode elders" care to elaborate,
    please feel free.

       Marco, I'm not sure it is of any use to try to explain in
    more detail, since you don't appear to be listening. However,
    I think I, Marc, Doug, and Mark (at the very least) seem
    to be in approximate agreement on this (at least, I have
    yet to see any major disagreement). I'm sure Michael
    would agree too (at least I hope so), and many others.

       Marco, please calm down and reread every sentence of my
    previous message. You seem to have misread quite a few things,
    but it is better you reread calmly before I try to clear
    up any remaining misunderstandings.

                    Kind regards
                    /kent k

    This archive was generated by hypermail 2.1.5 : Tue Oct 29 2002 - 13:28:14 EST