Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters

From: Asmus Freytag (
Date: Wed Aug 04 2010 - 12:56:32 CDT

  • Next message: verdy_p: "Re: Draft Proposal to add Variation Se A quences for Latin and Cyrillic  letters"

    On 8/2/2010 5:04 PM, Karl Pentzlin wrote:
    > I have compiled a draft proposal:
    > Proposal to add Variation Sequences for Latin and Cyrillic letters
    > The draft can be downloaded at:
    > (4.3 MB).
    > The final proposal is intended to be submitted for the next UTC
    > starting next Monday (August 9).
    > Any comments are welcome.
    > - Karl Pentzlin
    This is an interesting proposal to deal with the glyph selection problem
    caused by the unification process inherent in character encoding.

    When Unicode was first contemplated, the web did not exist and the
    expectation was that it would nearly always be possible to specify the
    font to be used for a given text and that selecting a font would give
    the correct glyph.

    As the proposal noted, universal fonts and viewing documents on other
    platforms and systems across the web have made this solution
    unattractive for general texts.

    We are left then with these five scenarios

    1) Free variation
    2) Orthographic variation of isolated characters (by language, e.g.
    different capitals)
    3) Orthographic variation of entire texts (e.g. italic Cyrillic forms,
    by language)
    4) Orthographic variation by type style (e.g. Fraktur conventions)
    5) Notational conventions (e.g. IPA)

    For free variation of a glyph, the only possible solutions are either
    font selection or use of a variation sequence. I concur with Karl, that
    in this case, where notable variations have been unified, that adding
    variation selectors is a much more viable means of controlling authorial
    intent than font selection.

    If text is language tagged, then Opentype mechanisms exist in principle
    to handle scenario 2 and 3. For full texts in a certain language, using
    variation selectors throughout is unappealing as a solution.

    However, it may be a viable solution for being able to embed correctly
    rendered citations in other text, given that language tagging can be
    separated from the document and that automatic language tagging may
    detect large chunks of text, but not short runs.

    The Fraktur problem is one where one typestyle requires additional
    information (e.g. when to select long s) that is not required for
    rendering the same text in another typestyle. If it is indeed desirable
    (and possible) to create a correctly encoded string that can be rendered
    without further change automatically in both typestyles, then adding any
    necessary variation sequences to ensure that ability might be useful.
    However, that needs to be addressed in the context of a precise
    specification of how to encode texts so that they are dual renderable.
    Only addressing some isolated variation sequences makes no sense.

    Notational conventions are addressed in Unicode by duplicate encoding
    (IPA) or by variation sequences. The scheme has holes, in that it is not
    possible in a few cases to select one of the variants explicitly,
    instead, the ambiguous form has to be used, in the hope that a font is
    used that will have the proper variant in place for the ambiguous form.

    Adding a few variation sequences (like the one to allow the "a" at 0061
    to be the two story one needed for IPA) would fill the gap for times
    when controlling the precise display font is not available.

    However, there's no need to add variation sequences to select an
    *ambiguous* form. Those sequences should be removed from the proposal.

    Overall a valuable starting point for a necessary discussion.


    This archive was generated by hypermail 2.1.5 : Wed Aug 04 2010 - 13:00:40 CDT