Re: Shades of Gray

From: Michael Everson (
Date: Sat Mar 05 2005 - 17:29:39 CST

  • Next message: Jon Hanna: "RE: CGJ for Two Greek Ligatures?"

    At 14:54 -0800 2005-03-05, wrote:
    >People think I'm being absolutely horrible. But
    >you should be more sympathetic.

    You should be less irritating. As usual I get to
    be the one who says it, but you've pissed off all
    the right people.

    >I looked at Unicode and observed a basic difficulty. "Where do you draw the
    >line." What fits exactly into what category.

    Nothing. The world's writing systems are a huge
    and delightful mess. Unicode is supposed to help
    people represent that mess. Unicode is not
    supposed to tidy it up and fix it.

    >And I saw that Unicode was having to answer
    >complex and finely gradated questions with the
    >bluntest of answers: black or white. Codepoint
    >or not.

    Codepoint or not, yes. Codepoints referring to monochrome reality, no.

    >A way to define something that was not quite a characterhood,

    This is not a word. And there is no "way" to
    "define" it. We encode what we need to encode. We
    encode what makes sense. There is no formula. And
    yes, that makes "us" some sort of obnoxious
    Úlite, who "dictate" what "is" and what "is not"
    a character. And some people get pissed off at
    "us". But "we" have to stand together even when
    we argue among "ourselves", and "know" whether
    what we are encoding is "right" or "useful" or

    And there's no effing way what we do can be put
    down into "rules". And even "we" sometimes
    disagree about what makes sense.

    >And a very simple technique for doing this is apparent -- use one or more
    >levels of variation selector-like codepoints to define a "sub-characterhood",
    >and even a "sub-sub-characterhood".

    Ah! Thanks! Another opportunity for me to put on
    my curmudgeon hat and say "bollocks".

    >I brought up an example of this with the Serbian 't'. My approach has a sound
    >conceptual basis

    Bzzzzzzzt. Thank you for playing. Your "problem"
    is the clearest of examples of glyph
    representation preference, and as such is out of
    scope for the Unicode Standard per se.

    >and can be done technically. It has the benefits that the
    >definition of the "sub-characterhood" is tightly bound to the characterhood,
    >providing data robustness, and codepoint-level data identification.

    You must be mad. There are millions and millions
    of Serbs and Bulgarians and gigaquads of Serbian
    and Bulgarian data out there, and you think that
    the word for "one hundred" -- which I may
    represent in Latin caps here as CTO -- should be
    written DIFFERENTLY for them than it should be in
    Russian, where it looks IDENTICAL in all but
    italic style in some or many or most fonts?

    That, Doug-the-Newer, is -- do let me refrain
    from gentleness -- an utterly STUPID idea.

    >But I was told, no, there is simply a better way of doing this: "language


    >And so I grudgingly accepted this,

    Do accept it with alacrity.

    >and moved on from my example given for
    >familiarity -- a local variation of the Cyrillic
    >script -- to an actual interest, obscure but
    >highly comparible local variations of the Greek

    Golly. Let me think. Is it actually "highly comparable"?

    Why, no, it isn't. Early Greek, like early Latin,
    is preferably represented using the regular Greek
    and Latin alphabets. Books by honest-to-goodness
    real scholars do this.

    >I said OK, now show me how "language tags" are going to apply to this, to get
    >the glyphs needed for these Greek script variants to display. And after a very
    >long frustrating process of non-answers, the dirty little truth came out.
    >"Language tags" are a fib.

    No. The two situations are not analogous. Early
    Greek isn't a different language from Greek, not
    in the same way as Russian and Serbian are,
    anyway. Further, the font issue for early Greek
    is one of global spans, not of single letter
    preferences. Moreover, real honest-to-goodness
    Greek merchants use early Greek letterforms on
    their signage even today, from time to time, for
    effect, and I betcha a grilled octopus that they
    represent their letterforms with -- imagine! --

    >Which you must admit sounds like a less convincing -- and less responsible --
    >rebuttal to my own very rational, and concrete, and dependable, and
    >Unicode-controlled approach.

    What you must admit, Doug-the-Newer, is that
    you've got a lot to learn about Unicode and its
    practice and culture, and you've really done
    yourself a disservice by coming in here and
    trying to teach us what it is that we do.

    Michael Everson * * Everson Typography *  *

    This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 17:32:32 CST