Re: Shades of Gray

From: Peter Kirk (
Date: Sat Mar 05 2005 - 20:31:11 CST

  • Next message: "Re: CGJ for Two Greek Ligatures?"

    On 05/03/2005 22:54, wrote:

    >People think I'm being absolutely horrible. But you should be more sympathetic.
    >I looked at Unicode and observed a basic difficulty. "Where do you draw the
    >line." What fits exactly into what category. And I saw that Unicode was having
    >to answer complex and finely gradated questions with the bluntest of answers:
    >black or white. Codepoint or not. And it occurred to me that what was called
    >for conceptually, was one or more shades of gray.
    >A way to define something that was not quite a characterhood, and yet
    >something still vital, or important, or at least useful, to have in (what I
    >will now call) the basic plaintext data.
    >And a very simple technique for doing this is apparent -- use one or more
    >levels of variation selector-like codepoints to define a "sub-characterhood",
    >and even a "sub-sub-characterhood". Not a "pseudo-codepont" -- but a real
    >piece of data, describing the real identity of something as a sub-category of
    >a codepoint. Data in one or more shades of gray.

    There is a mechanism defined for this "sub-characterhood", Variation

    >I brought up an example of this with the Serbian 't'. My approach has a sound
    >conceptual basis and can be done technically. It has the benefits that the
    >definition of the "sub-characterhood" is tightly bound to the characterhood,
    >providing data robustness, and codepoint-level data identification.

    For reasons which have been explained before, most convicingly the one
    that to introduce this usage now would disturb widespread current usage,
    Variation Selectors are not considered suitable for Serbian 't'. But
    that does not mean that they are unsuitable for your local Greek
    alphabet examples. If you can find some suitable reasonably standardised
    variant shapes for individual letters, you might consider proposing them
    for standardisation. But if you end up with a complete alphabet of
    variant shapes, the issue becomes a rather different one, not just
    sub-characterhood but sub-scripthood. And I accept that Unicode has not
    found a good way to resolve this one, either generally or in individual
    controversial cases.

    I must say I am thinking it would be a good idea to define a subset of
    the 256 variation selectors (already specified as default ignorable) as
    available for private use. (The existing PUA characters are not a good
    substitute as they are not default ignorable.) At least this would be a
    good way for the Unicode community to deal with recurrent issues like
    the ones Doug is repeatedly raising: advice can be given to use the
    Private Use Variation Selectors to select variant glyphs in any way you
    want, as long as you do it only between consenting adults - and the text
    would automatically default to being displayed with the regular glyphs
    by anyone outside the private loop. At the cost of a small number of
    code points, this could get a lot of people off our back, and stop them
    abusing Unicode in more fundamentally damaging ways.

    >But I was told, no, there is simply a better way of doing this: "language

    I don't think anyone has ever encouraged you to use the Unicode language
    tags. This mechanism is available, but is not well defined and not
    particularly suitable for your purposes.

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.6.2 - Release Date: 04/03/2005

    This archive was generated by hypermail 2.1.5 : Sat Mar 05 2005 - 20:37:03 CST