Re: Codepoint Differentiation

From: Asmus Freytag (
Date: Tue Feb 22 2005 - 02:07:26 CST

  • Next message: Gregg Reynolds: "Re: [idn] IDN spoofing"

    At 04:44 PM 2/21/2005, wrote:
    >1. Then it sound like:
    > - Serbian Cyrillic Small "t"

    This should be handled by language dependent glyph selection.
    That's a standard feature in OpenType and there's no need to
    duplicate that facility in the encoding.

    (Unless I misunderstand this example).

    > - Coptic letterforms for Greek letter codepoints

    Already being encoded in 4.1 in a new "Coptic" block. The unification of
    these has been considered a mistake - it took a while to rectify as we
    needed to research what precisely the Coptic repertoire should be.

    >- complete Archaic Greek and Asia Minor scripts aligned to Greek letter

    Rather than messing with variation selectors, this is best handled by using
    fonts that are specific to archaic use.

    Where it's a question of a a different script - be patient, it's probably
    slated to be encoded.

    It's a common problem that archaic scripts use different shapes at
    different times for the same characters. Sometimes, the answer may be that
    it's really two different scripts, in which case the precursor can be coded
    separately. Sometimes, it's reasonable to ask users to use a different font
    for a given period. Sometimes, a specific higher level protocol should be
    developed to handle specific problems of scholarly representation of text.

    As a last resort, variation selectors might be used in some instances - but
    not as a blanket approach.

    >are exactly what the Variation Selectors were designed for. There are no
    >issues other than a smart font substituting an alternate glyph. They can
    >default in a "low-fidelity" rendering to the primary codepoint glyphs. I would
    >welcome individual codepoints for them, but Unicode has already decided
    >otherwise. There is a clear need to be able to access the glyphs somehow, on a
    >device which has only one general Unicode font installed.

    As stated above (and as others have pointed out) your premise is incorrect
    for many of your examples. Not everything that requires glyph substitution
    should be encoded via variation selectors.


    >2. I don't know much about:
    > - alternate CJK ideographs and syllabographs
    >But I imagine some or all of them would fit the criteria as well.
    >3. German Sharp S which I mention, is probably too complicated under the
    >present Variation Selector definition, and will have to go down to my second
    >category with combining marks.
    >4. So, that leaves me a little mystified why a variation selector isn't
    >already in use for the notorious Serbian "t". Seems a lot more practical than
    >switching language identifiers every word in an HTML Russian-Serbian
    >5. As to functions with combining marks, much of my original post discusses
    >the likely need for a new class of differentiating codepoints, other than
    >Variation Selectors, to handle that. In some cases the CGJ (or ZWJ) might be
    >usable, though I am already finding an essential problem with that for umlaut
    >vs. diaeresis (which you are all just dying to hear about -- and which
    >urgently needs to be solved).
    >Asmus Freytag wrote:
    > >
    > > > > Is there actually any problem with using Variation Selectors as-is to
    > > > > differentiate ...
    > >
    > > Doug already answered about the fact that only standardized sequences
    > are valid
    > > and the only standardizer for sequences is the Unicode Consortium.
    > >
    > > Beyond that, variation selectors have another limitation: their only
    > > function is to identify variants - and that means variants with different
    > > GLYPH, not variants with different *behavior*.
    > >
    > > Variation selectors are designed to be ignorable for all processes that
    > > don't deal in rendering, and, they are also ignorable for low-fidelity
    > > rendering, i.e. rendering that does not support them (yes, I know, that's a
    > > bit circular).
    > >
    > > For distinctions in *sorting behavior*, a Combining Grapheme Joiner can
    > > often be used - but it is not intended to result in differences in display.
    > >
    > > The use of all of these special encoding crutches needs to be kept to a
    > > minimum. We all know cases where using a variation selector is preferable
    > > over adding a new character, since the differentiation is minute, not
    > > universally applicable or both. However, most text processes have to be
    > > designed to actively ignore them - and you have to be able to know, in
    > > advance, for which process they can (and must) be ignored.
    > >
    > > That means, you cannot arbitrarily use existing mechanisms to make
    > > distinctions that matter to algorithms that were designed to ignore these
    > > mechanisms. Therefore, for variation selectors, any non-glyphic
    > > distinctions are completely out of the picture.
    > >
    > > A./

    This archive was generated by hypermail 2.1.5 : Tue Feb 22 2005 - 02:08:13 CST