Re: Codepoint Differentiation

From: Mark E. Shoulson (
Date: Tue Feb 22 2005 - 07:19:07 CST

  • Next message: Gregg Reynolds: "Re: nameprep, IDN spoofing and the registries" wrote:

    >Please note clearly the distinction between my main proposals, which should be
    >implemented officially by Unicode, and this side issue of "Codepoint PUAs".
    >On this side issue, consider the following.
    >We add to Unicode a small block of "Private Differentiation Selector"
    >codepoints, which are to be totally ignored by everything except, optionally,
    >a customized smart font.
    >The users of Klingon now get together, and decide they are going to use
    >"Private Differentiation Selector 5" for Klingon.
    >They simply take the codepoints of the Latin letters which transliterate
    >Klingon, and pair "PDS 5" with each letter's codepoint.
    >Now, users with a smart Klingon font get Klingon glyphs. Users who lack a
    >smart font with Klingon glyphs automatically get the Latin transliteration. We
    >can also do useful things for learners, by dynamically switching the specified
    >font with DHTML in a Klingon learning Web page.
    Klingon is already fairly widely used in the PUA; see the ConScript
    Registry at for the encoding.
    And see for some usage thereof (you can also
    see the KLI's online journal in PUA Klingon or in Latin; your choice.
    There are some other resources using it as well, I believe).

    Of course, *I* can't see why Klingon shouldn't get encoded officially in
    Plane 1, but that's ground already covered, and I know the given
    reasons. Perhaps there'll be more usage that will change things.

    >What we have done is turn Unicode from a "one dimensional array" into a "two
    >dimensional array". The primary (and defaultable) glyphs and meanings get real
    >codepoints along the main axis, and secondary (and allowably ignorable) glyphs
    >and/or meanings get "differentiators" along the secondary axis.
    Yes, and in the process inflicted great violence on the whole concept
    and basis of Unicode in the first place.

    OK, maybe that's strong. And yes, I see the efficiency of using a
    "two-dimensional" encoding set--essentially turning all of Unicode into
    (potential) surrogate characters. But that also lends complication and

    I think some of the problem really lies in the way VSs hover on the
    border between true encoding and "just a glyph variant." It makes it so
    tempting to push them one way or another. If something is distinct
    enough to get an official VS variant, maybe it should really be encoded
    (or as in this case, maybe VSs should be considered a valid way to
    encode more characters). OTOH, if something isn't important enough to
    encode in Unicode, why is Unicode messing around with indicating it?
    It's a halfway solution, presumably intended for borderline cases...
    which are the hardest to identify and classify. I know Michael Everson
    considers VSs to be essentially pseudo-coding (am I representing your
    opinion correctly?); I think this is why.

    >That basic principal also applies to my main proposals, which would use other
    >sets of "differentiator" codepoints, assigned officially by Unicode.
    >It's an extremely useful and efficient system for dealing with things --
    >glyphs or meanings -- that have an identity as a "subset" of a real codepoint.
    It's not necessarily a bad way to do things, but your concept of what
    Unicode is and should be does not fit in with that held by the UTC and
    most others (and they're bigger than you).


    This archive was generated by hypermail 2.1.5 : Tue Feb 22 2005 - 07:19:49 CST