RE: Transcoding Tamil in the presence of markup

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Dec 07 2003 - 14:45:06 EST

  • Next message: Peter Jacobi: "Re: Transcoding Tamil in the presence of markup"

    > As an example, the vowel pairs a/ya, o/yo, u/yu, and so on
    > are distinguished by changing from one small stroke to two
    > small strokes. A Web page for children or foreigners may
    > want to color these strokes separately. With the current
    > encoding(s) in Unicode this is not possible, but I'm sure
    > somebody has designed an encoding where this would be possible.

    for these wowels pairs, this is not impossible to do:
    but one must remember that ya, yo, yu are in fact compound
    letters (even if they are composed in the johab set of jamos that
    was used in Unicode) and are safely decomposable in Hangul as
    separate vowels, even if they are not canonically decomposable
    in Unicode.

    So you could safely decompose, when creating the document,
    these compound vowels, so that they can be each assigned a
    distinct style for instructing renderers.

    It's just a shame that these compund letters were not given
    explicit canonical decompositions in Unicode (so that they would
    not occur with documents in NFD and NFKD forms, but could still
    be compressed with johab compound jamos and then as LV and LVT
    syllables in NFC and NFKC forms).

    As a rendering process such as a browser does not need to output
    characters when rendering Hangul texts, I think they can safely
    add these decompositions internally and recompose to NFC form to
    optimize the final rendering in fonts, when the letters in the
    same syllabic cluster share the same style; if this is not the
    case, then it's up to the browser to split syllables and render
    them using more basic Hangul jamos (but then the browser needs
    to know the way multiple jamos are composed, i.e. <L*V*> above
    <T*>, then <L*> on the left of <V*> unless <V*> is horizontal
    (in which case <L*> is above <T*>, and then letters in <L*> aligned
    horizontally if they are vertical, such as SSANG* CHOSEONG's), and
    same thing for <V*> (this includes the vowels pair YE which is
    composed horizontally, as Y is vertical). The Hangul script is so
    logical that even complex clusters are easy to compose and read,
    and even to transcode to ASCII or to sort.

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Sun Dec 07 2003 - 15:32:43 EST