Re: Codepoint Differentiation

Date: Mon Feb 21 2005 - 18:44:40 CST

  • Next message: Rick McGowan: "Re: Codepoint Differentiation"


    1. Then it sound like:

     - Serbian Cyrillic Small "t"
     - Coptic letterforms for Greek letter codepoints
     - complete Archaic Greek and Asia Minor scripts aligned to Greek letter codepoints

    are exactly what the Variation Selectors were designed for. There are no
    issues other than a smart font substituting an alternate glyph. They can
    default in a "low-fidelity" rendering to the primary codepoint glyphs. I would
    welcome individual codepoints for them, but Unicode has already decided
    otherwise. There is a clear need to be able to access the glyphs somehow, on a
    device which has only one general Unicode font installed.

    2. I don't know much about:

     - alternate CJK ideographs and syllabographs

    But I imagine some or all of them would fit the criteria as well.

    3. German Sharp S which I mention, is probably too complicated under the
    present Variation Selector definition, and will have to go down to my second
    category with combining marks.

    4. So, that leaves me a little mystified why a variation selector isn't
    already in use for the notorious Serbian "t". Seems a lot more practical than
    switching language identifiers every word in an HTML Russian-Serbian dictionary.

    5. As to functions with combining marks, much of my original post discusses
    the likely need for a new class of differentiating codepoints, other than
    Variation Selectors, to handle that. In some cases the CGJ (or ZWJ) might be
    usable, though I am already finding an essential problem with that for umlaut
    vs. diaeresis (which you are all just dying to hear about -- and which
    urgently needs to be solved).


    Asmus Freytag wrote:
    > > > Is there actually any problem with using Variation Selectors as-is to
    > > > differentiate ...
    > Doug already answered about the fact that only standardized sequences are valid
    > and the only standardizer for sequences is the Unicode Consortium.
    > Beyond that, variation selectors have another limitation: their only
    > function is to identify variants - and that means variants with different
    > GLYPH, not variants with different *behavior*.
    > Variation selectors are designed to be ignorable for all processes that
    > don't deal in rendering, and, they are also ignorable for low-fidelity
    > rendering, i.e. rendering that does not support them (yes, I know, that's a
    > bit circular).
    > For distinctions in *sorting behavior*, a Combining Grapheme Joiner can
    > often be used - but it is not intended to result in differences in display.
    > The use of all of these special encoding crutches needs to be kept to a
    > minimum. We all know cases where using a variation selector is preferable
    > over adding a new character, since the differentiation is minute, not
    > universally applicable or both. However, most text processes have to be
    > designed to actively ignore them - and you have to be able to know, in
    > advance, for which process they can (and must) be ignored.
    > That means, you cannot arbitrarily use existing mechanisms to make
    > distinctions that matter to algorithms that were designed to ignore these
    > mechanisms. Therefore, for variation selectors, any non-glyphic
    > distinctions are completely out of the picture.
    > A./

    This archive was generated by hypermail 2.1.5 : Mon Feb 21 2005 - 18:31:16 CST