Re: Codepoint Differentiation

From: Doug Ewell (
Date: Wed Feb 23 2005 - 01:10:10 CST

  • Next message: Gregg Reynolds: "Re: IDN spoofing"

    Doug <UList at dfa dash mail dot com> wrote:

    > The users of Klingon now get together, and decide they are going to
    > use "Private Differentiation Selector 5" for Klingon.
    > They simply take the codepoints of the Latin letters which
    > transliterate Klingon, and pair "PDS 5" with each letter's codepoint.
    > Now, users with a smart Klingon font get Klingon glyphs. Users who
    > lack a smart font with Klingon glyphs automatically get the Latin
    > transliteration. We can also do useful things for learners, by
    > dynamically switching the specified font with DHTML in a Klingon
    > learning Web page.

    Neither variation selectors (public or private) not any other mechanism
    within Unicode is intended for automatic 1-to-1 transliteration. The
    only exception I can think of is a small number of Latin digraphs
    intended for transliteration with Cyrillic. These proved to be neither
    necessary nor sufficient, and their use is discouraged.

    > And there are absolutely no problems with a Korean character showing
    > up in the middle of their Web page -- as may currently occur with the
    > PUA.

    You have exactly the same issues with font dependency using this
    approach as you would with the PUA, except that your solution requires
    "smart fonts" and the PUA solution doesn't.

    > So we now see how a small block of codepoints, with almost zero impact
    > on processing, can vastly increase the usefulness of Unicode to real-
    > world people.

    1. Interspersing a variation selector after EVERY letter does not
    constitute "almost zero impact."

    2. Variation selectors are for making minor glyphic distinctions within
    a character, not for turning Latin into Klingon and vice versa.

    3. This mechanism does not "vastly increase the usefulness of Unicode"
    to anyone. Mark Shoulson already explained that Klingon-alphabet users
    get along just fine with a PUA-based solution.

    4. Adopting the style of a professor lecturing his students does not
    change any of points 1 through 3.

    > What we have done is turn Unicode from a "one dimensional array" into
    > a "two dimensional array". The primary (and defaultable) glyphs and
    > meanings get real codepoints along the main axis, and secondary (and
    > allowably ignorable) glyphs and/or meanings get "differentiators"
    > along the secondary axis.


    > It's an extremely useful and efficient system for dealing with things
    > -- glyphs or meanings -- that have an identity as a "subset" of a real
    > codepoint.

    Please read up on the Unicode Standard. Klingon letters are not
    "subsets" of Latin letters.

    > I'm going to be elaborating on Diaeresis vs. Umlaut further in an
    > upcoming post.

    You do know this problem has already been solved, right?

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Wed Feb 23 2005 - 01:11:12 CST