From: Michael Everson (email@example.com)
Date: Sat Mar 05 2005 - 17:29:39 CST
At 14:54 -0800 2005-03-05, UList@dfa-mail.com wrote:
>People think I'm being absolutely horrible. But
>you should be more sympathetic.
You should be less irritating. As usual I get to
be the one who says it, but you've pissed off all
the right people.
>I looked at Unicode and observed a basic difficulty. "Where do you draw the
>line." What fits exactly into what category.
Nothing. The world's writing systems are a huge
and delightful mess. Unicode is supposed to help
people represent that mess. Unicode is not
supposed to tidy it up and fix it.
>And I saw that Unicode was having to answer
>complex and finely gradated questions with the
>bluntest of answers: black or white. Codepoint
Codepoint or not, yes. Codepoints referring to monochrome reality, no.
>A way to define something that was not quite a characterhood,
This is not a word. And there is no "way" to
"define" it. We encode what we need to encode. We
encode what makes sense. There is no formula. And
yes, that makes "us" some sort of obnoxious
Úlite, who "dictate" what "is" and what "is not"
a character. And some people get pissed off at
"us". But "we" have to stand together even when
we argue among "ourselves", and "know" whether
what we are encoding is "right" or "useful" or
And there's no effing way what we do can be put
down into "rules". And even "we" sometimes
disagree about what makes sense.
>And a very simple technique for doing this is apparent -- use one or more
>levels of variation selector-like codepoints to define a "sub-characterhood",
>and even a "sub-sub-characterhood".
Ah! Thanks! Another opportunity for me to put on
my curmudgeon hat and say "bollocks".
>I brought up an example of this with the Serbian 't'. My approach has a sound
Bzzzzzzzt. Thank you for playing. Your "problem"
is the clearest of examples of glyph
representation preference, and as such is out of
scope for the Unicode Standard per se.
>and can be done technically. It has the benefits that the
>definition of the "sub-characterhood" is tightly bound to the characterhood,
>providing data robustness, and codepoint-level data identification.
You must be mad. There are millions and millions
of Serbs and Bulgarians and gigaquads of Serbian
and Bulgarian data out there, and you think that
the word for "one hundred" -- which I may
represent in Latin caps here as CTO -- should be
written DIFFERENTLY for them than it should be in
Russian, where it looks IDENTICAL in all but
italic style in some or many or most fonts?
That, Doug-the-Newer, is -- do let me refrain
from gentleness -- an utterly STUPID idea.
>But I was told, no, there is simply a better way of doing this: "language
>And so I grudgingly accepted this,
Do accept it with alacrity.
>and moved on from my example given for
>familiarity -- a local variation of the Cyrillic
>script -- to an actual interest, obscure but
>highly comparible local variations of the Greek
Golly. Let me think. Is it actually "highly comparable"?
Why, no, it isn't. Early Greek, like early Latin,
is preferably represented using the regular Greek
and Latin alphabets. Books by honest-to-goodness
real scholars do this.
>I said OK, now show me how "language tags" are going to apply to this, to get
>the glyphs needed for these Greek script variants to display. And after a very
>long frustrating process of non-answers, the dirty little truth came out.
>"Language tags" are a fib.
No. The two situations are not analogous. Early
Greek isn't a different language from Greek, not
in the same way as Russian and Serbian are,
anyway. Further, the font issue for early Greek
is one of global spans, not of single letter
preferences. Moreover, real honest-to-goodness
Greek merchants use early Greek letterforms on
their signage even today, from time to time, for
effect, and I betcha a grilled octopus that they
represent their letterforms with -- imagine! --
>Which you must admit sounds like a less convincing -- and less responsible --
>rebuttal to my own very rational, and concrete, and dependable, and
What you must admit, Doug-the-Newer, is that
you've got a lot to learn about Unicode and its
practice and culture, and you've really done
yourself a disservice by coming in here and
trying to teach us what it is that we do.
-- Michael Everson * * Everson Typography * * http://www.evertype.com
This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 17:32:32 CST