Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts

From: Asmus Freytag (
Date: Fri Jan 26 2007 - 12:16:24 CST

  • Next message: Asmus Freytag: "Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts"

    On 1/25/2007 1:13 PM, Ruszlan Gaszanov wrote:
    > There's one thing I don't quite understand. Why do we keep encoding variations and combinations (ligatures) of the same base letters from Latin, Greek, Cyrillic end similar scripts as separate code points, when mechanisms already exist to compose them from base characters?
    > Consider +A8Y- <U+ACs-03C6> (GREEK SMALL LETTER PHI) and +A9U- <U+ACs-03D5> (GREEK PHI SYMBOL) for instance. <U+ACs-03D5> only exists to enforce "straight" glyph in mathematical context. Wouldn't it be more sensible to apply, let's say VS1 <U+ACs-FE00> to <U+ACs-03C6> to enforce "loopy" glyph and VS2 <U+ACs-FE01> to enforce "straight" glyph where distinction is important, while leaving it to the font designer to chose the glyph for pain "VS-less" <U+ACs-03C6>.
    I think you first of all have to consider that whether you use a single
    character code or a sequence, it's an encoding. In case of using
    variation selectors, you've made the encoding more complicated.

    The use of VS is appropriate when the distinction between two shapes
    isn't needed by all users, and when treating the character as if the VS
    isn't there is appropriate for most processes (in fact, all but display).

    This is largely true for the Mongolian FVS, where text processes other
    than display just act on the base character. It is true for those math
    symbols for which we designated VS sequences - as far as we've
    established, they mean the same thing, and ignoring the VS in
    non-display processing is correct.

    The Greek letterforms used in math are different. They do mean different
    things, and treating them in text processes (search) as if they are the
    same is definitely not correct. In Greek text, the character codes for
    letter forms should not be used, and fonts should supply whatever glyph
    they desire for the standard character code.
    (That makes some fonts unusable for math, but that's OK).
    > Or, let's take all those spacing/combining subscript/superscript forms and so-called "mathematical alphabets" - couldn't the same thing have been accomplished by specific VS?
    No, because then the VS would represent meaning. It's just the same
    reason why we don't have a VS for uppercase letters, but code both upper
    and lower case separately. I do want to be able to search for vector 'a'
    (bold) as distinct from factor a (italic) in a mathematical text. I
    don't care that both are based on the first letter of the Latin
    alphabet, so I don't want the search to find the word 'a'.

    Using VS is the wrong choice, because it implies that situations where I
    want to ignore the distinction. and where I need to relate to the base
    character, are common.

    The rules for using characters and character shapes in written languages
    on the one hand and technical and scholarly notations on the other are
    different. Unicode correctly reflects that distinction.


    This archive was generated by hypermail 2.1.5 : Fri Jan 26 2007 - 12:20:24 CST