Re: Level of Unicode support required for various languages

From: Vinod Kumar (
Date: Thu Oct 25 2007 - 04:10:24 CDT

  • Next message: David Starner: "Re: Cost of no OCR for extended Latin"

    Let me rephrase the question from the software architecture viewpoint. What
    is the level or nature of support to be provided by the unicode text
    rendering software architecture?

    Some assumptions are:
    1) The Unicode standard has encoded the script as well as its rendering
    2) The support sought is for the operation of rendering and not for sorting

    The answer to the rephrased question, from example (ICU) and experience, is:
    1) There should be a character level operation to find the appropriate
    boundaries in the text stream. At the lowermost level, we have to identify
    the sequence of characters that have to be treated together to obtain its
    rendering. For Latin, this is mostly a single character, but for South Asian
    scripts this is a 'logical syllable'. International Components for Unicode
    (ICU from IBM) provides a character iterator for returning the next sequence
    of characters that should be treated as a unit for shaping purpose.

    2) There should be a function that takes in a syllable of characters
    (obtained from 1 ) and a font to return the sequence of glyphs from the
    font. If these glyphs are laid out, they would be a rendering of the
    syllable. The concrete method used by ICU is of the form
    XXXLayoutEngine->layoutChars(text, ...) followed by
    XXXLayoutEngine->getGlyphs(glyphs, ..);

    Thus for a any script XXX, Unicode would arrive at an encoding for the
    characters and how the characters will be shaped. Now it for the software
    archtects to implement the rendering in a concrete form. Especially for (2),
    this involves using an intelligent font (ISO/IEC 15285) such as OpenType or
    AAT able to convert the n characters from the syllable of (1) to m glyphs.
    Thus if the software can implement these two functions (breaking text to
    syllables, and getting the glyphs corresponding to each syllable) we can say
    that the software supports the unicode text rendering of script XXX.

    Vinod Kumar
    Project IndiX
    ps: While I was wording my answer Timothy Armes has thanked all for their
    answers and put his questions into context. Maybe this answer can address
    the issue now.

    This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 04:12:57 CDT