Date: Sat Mar 05 2005 - 16:54:58 CST
People think I'm being absolutely horrible. But you should be more sympathetic.
I looked at Unicode and observed a basic difficulty. "Where do you draw the
line." What fits exactly into what category. And I saw that Unicode was having
to answer complex and finely gradated questions with the bluntest of answers:
black or white. Codepoint or not. And it occurred to me that what was called
for conceptually, was one or more shades of gray.
A way to define something that was not quite a characterhood, and yet
something still vital, or important, or at least useful, to have in (what I
will now call) the basic plaintext data.
And a very simple technique for doing this is apparent -- use one or more
levels of variation selector-like codepoints to define a "sub-characterhood",
and even a "sub-sub-characterhood". Not a "pseudo-codepont" -- but a real
piece of data, describing the real identity of something as a sub-category of
a codepoint. Data in one or more shades of gray.
I brought up an example of this with the Serbian 't'. My approach has a sound
conceptual basis and can be done technically. It has the benefits that the
definition of the "sub-characterhood" is tightly bound to the characterhood,
providing data robustness, and codepoint-level data identification.
But I was told, no, there is simply a better way of doing this: "language
And so I grudgingly accepted this, and moved on from my example given for
familiarity -- a local variation of the Cyrillic script -- to an actual
interest, obscure but highly comparible local variations of the Greek script.
I said OK, now show me how "language tags" are going to apply to this, to get
the glyphs needed for these Greek script variants to display. And after a very
long frustrating process of non-answers, the dirty little truth came out.
"Language tags" are a fib.
The actual answer for the Serbian 't' is: Unicode chooses not to deal with
this, Unicode absolves itself of all responsibility for dealing with this, and
Unicode absolves itself of all responsibility for following up that it is
dealt with elsewhere -- and incidentally there might be some technical way,
someday, outside of Unicode, to do something as insignificant as actually
displaying that glyph, by means of a standardized language tag.
Which you must admit sounds like a less convincing -- and less responsible --
rebuttal to my own very rational, and concrete, and dependable, and
This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 16:41:06 CST