Re: Understanding PUA - (was) Re: Re: Ligatures For Indic languages

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Jun 01 2008 - 17:17:12 CDT

  • Next message: Andreas Prilop: "Re: UTF 8 "Content-Language""

    Mahesh T. Pai wrote on Friday, May 30, 2008 9:20 AM

    > How can one ensure that text created on one rendering system
    > (rendering/layout engine + font) which uses one method of using glyphs
    > from the PUA renders same on another system which may or may not use a
    > different method of using the PUA.

    You can't even do it for the assigned codepoints! (You can, of course,
    preserve an image definition, which is a major argument for using PDF.)

    > Somebody told me something on IRC, and I have extrapolated that
    > information to understand that (1) the truetype fonts have a method of
    > naming glyphs, which can be used uniformly, irrespective of the
    > position of the glyph in the PUA (2) the layout engine has to have a
    > mapping from a sequence to a named glyph (3) once the layout engine
    > encounters a code sequence which has a predefined mapping to a named
    > glyph, the glyph is substituted, irrespective of position of the glyph
    > in the PUA. (4) The OpenType specs take the sequence <> glyph mapping
    > out of the rendering engine's realm and places the onus on the font
    > file itself.

    What you may be thinking of is Adobe's scheme for naming glyphs. Apart from
    a large, 'basic', non-Indic set of characters, glyphs may be named in the
    form <Unicode character sequence specifier>.<variant ID>. Then, given a
    sequence of *glyph names*, *and* the 'logical' sequence of glyphs, you can
    recover the character sequence. The 'logical' sequence of glyphs may not be
    straightforward to obtain - problems may arise with preposed vowels,
    ligatures of discontiguous characters (e.g. Arabic consonants), graphically
    decomposed characters (e.g. Thai SARA AM, especially after a tone mark, and
    graphically composite Khmer vowels), and possibly the use of the same glyph
    for different characters, e.g. for U+0032 DIGIT TWO, U+00B2 SUPERSCRIPT TWO
    and U+2082 SUBSCRIPT TWO.

    The PUA comes into the above only if the PUA is used to give each glyph a
    character code, and a sequence of these character codes is the text data
    that is transferred.

    For the PUA, the only general way to preserve meaning is to preserve the
    font and adequate documentation of the rendering system.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sun Jun 01 2008 - 17:21:07 CDT