Re: List of ligatures for languages of the Indian subcontinent. (from Re: per-character "stories" in a database)

From: John Hudson
Date: Mon Mar 17 2003 - 16:37:49 EST

    A few observations, so that William will understand the scope and some of
    the issues of what he is proposing.

    1. For some Indic scripts, including Devanagari, there is no fixed set of
    'ligatures' that would be normative for every typeface, or for every
    language using the script. So even for a single script you would be looking
    at multiple lists, with the same combination of characters likely
    represented in different ways for different languages.

    2. The idea of a 'ligature', as it exists in the Latin script, is not
    really found in Indic scripts. This terminology derives from the
    application of particular typecasting and typesetting technologies to Indic
    scripts. So while some aspects of some Indic scripts may, with relative
    accuracy, be spoken of as ligatures in some font formats (e.g. the 'akhand'
    feature of OpenType that forms obligatory 'ligatures'), it is not necessary
    that Indic scripts require mapping of multiple characters to single glyphs.
    This is simply one model for rendering one aspect of Indic scripts. [As a
    parallel, consider Tom Milo's ligature-free approach to Arabic, another
    script widely and erroneously assumed to involve ligatures.]

    3. As Rick has already alluded to re. Tibetan, it is far from necessary for
    all the *graphemes* of a script to be represented by individual, ligature
    glyphs. A grapheme may be composed of single glyphs and/or ligatures
    combined with dynamically positioned mark glyphs. Building or even
    cataloguing every possible grapheme -- every combination of base glyph,
    ligature and mark(s) in a script -- is an incredibly inefficient approach
    to Indic rendering.

    4. Cataloguing and publishing known consonant conjunct forms for Indic
    scripts is a good idea and a worthwhile goal, which would indeed be a
    valuable resource for font developers. Michael Everson has indicated that
    he has what he considers a comprehensive list for Devanagari, and I
    probably have something close to comprehensive in my own files and books.
    However, William should not delude himself that such a catalogue would
    represent all that is necessary to rendering Indic scripts in the
    technologies that interest him. Once you have the conjuncts catalogued, and
    have identified subsets of conjuncts that are appropriate to the languages
    that you intend to support, you still need to implement shaping and
    positioning for matras relative to every base glyph and every conjunct.

    William writes: '...I do not have any skills at Indian languages.' While
    some may find his enthusiasm admirable, it would be a good idea for him to
    develop such skills before he starts writing papers on implementing such
    languages for digital interactive broadcasting or any other technology.

    John Hudson

