Shades of Gray

Date: Sat Mar 05 2005 - 16:54:58 CST

  • Next message: Jon Hanna: "RE: Unicode's Mandate"

    People think I'm being absolutely horrible. But you should be more sympathetic.

    I looked at Unicode and observed a basic difficulty. "Where do you draw the
    line." What fits exactly into what category. And I saw that Unicode was having
    to answer complex and finely gradated questions with the bluntest of answers:
    black or white. Codepoint or not. And it occurred to me that what was called
    for conceptually, was one or more shades of gray.

    A way to define something that was not quite a characterhood, and yet
    something still vital, or important, or at least useful, to have in (what I
    will now call) the basic plaintext data.

    And a very simple technique for doing this is apparent -- use one or more
    levels of variation selector-like codepoints to define a "sub-characterhood",
    and even a "sub-sub-characterhood". Not a "pseudo-codepont" -- but a real
    piece of data, describing the real identity of something as a sub-category of
    a codepoint. Data in one or more shades of gray.

    I brought up an example of this with the Serbian 't'. My approach has a sound
    conceptual basis and can be done technically. It has the benefits that the
    definition of the "sub-characterhood" is tightly bound to the characterhood,
    providing data robustness, and codepoint-level data identification.

    But I was told, no, there is simply a better way of doing this: "language

    And so I grudgingly accepted this, and moved on from my example given for
    familiarity -- a local variation of the Cyrillic script -- to an actual
    interest, obscure but highly comparible local variations of the Greek script.

    I said OK, now show me how "language tags" are going to apply to this, to get
    the glyphs needed for these Greek script variants to display. And after a very
    long frustrating process of non-answers, the dirty little truth came out.
    "Language tags" are a fib.

    The actual answer for the Serbian 't' is: Unicode chooses not to deal with
    this, Unicode absolves itself of all responsibility for dealing with this, and
    Unicode absolves itself of all responsibility for following up that it is
    dealt with elsewhere -- and incidentally there might be some technical way,
    someday, outside of Unicode, to do something as insignificant as actually
    displaying that glyph, by means of a standardized language tag.

    Which you must admit sounds like a less convincing -- and less responsible --
    rebuttal to my own very rational, and concrete, and dependable, and
    Unicode-controlled approach.

    This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 16:41:06 CST