Displaying the languages of the Indian subcontinent. (derives from Re: Please see my latest proposal)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Mon Mar 03 2003 - 17:11:30 EST

  • Next message: David Oftedal: "Need program to convert UTF-8 -> Hex sequences"

    Michael Everson wrote as follows.



    Your BENGALI LETTER OPEN O can be encoded already with the sequence
    U+0985 U+09CD U+09AF.

    Your BENGALI LETTER CENTRAL E can be encoded already with the
    sequence U+098F U+09CD U+09AF.

    There is no need to "bring the Bengali code block in line with the
    Devanagari block".

    end quote

    Firstly, I mention that I am not a linguist and do not write to make a
    linguistic comment at all.

    As some readers of this mailing list may know, I am very interested in
    interactive television, in particular the DVB-MHP (Digital Video
    Broadcasting - Multimedia Home Platform) system, which uses Unicode.

    Now, from the specification for the DVB-MHP system, which can be downloaded
    from the http://www.mhp.org website, it appears that fonts for the DVB-MHP
    system, which can be broadcast, are to be in the PFR0 system, Portable Font
    Resource version 0. I have some time ago obtained some details of that
    system and looked through them, but did not follow all of the details, yet,
    as the system seemed to date from the early 1990s it seems entirely possible
    that the PFR0 system does not support the mechanism which allows a font to
    substitute a particular glyph for a sequence such as the U+0985 U+09CD
    U+09AF which Michael mentioned in his reply to Andy, quoted above.

    It would therefore seem that the DVB-MHP interactive television system,
    which is a system for worldwide use, may come up against considerable
    rendering problems when it comes to making broadcasts using the languages of
    the Indian subcontinent. I am seeking to resolve that problem by devising
    an infrastructural tool to program round the problem by preprocessing
    received Unicode text in the television receiver before it is passed to the
    font, so that facilities for quality typography for the languages of the
    Indian subcontinent exist with the DVB-MHP platform.

    Is this a problem particular just to interactive television or is it a wider

    I made a suggestion for a eutocode typography file in the following web


    Now whether that use of some of the code points of the Private Use Area by a
    user community were used in some scenarios (for example with PFR0 fonts in
    interactive broadcasting) or whether the glyphs would be numbered in some
    other sequence of numbering within a font, I am putting forward for
    discussion the question as to whether it might be useful for there to be
    produced a list of ligatures for the languages of the Indian subcontinent
    such that each ligature has an index number in an ordered sequence from 1
    upwards, so that those code numbers can be a standard way of accessing
    glyphs within fonts or within systems such as a eutocode typography file.
    It may be that any particular application of such a list would add an offset
    constant to the list number during processing, for example hexadecimal EC00
    for a eutocode typography file, or maybe 500 for an advanced format font,
    yet the idea would be that some particular glyph for a particular ligature
    glyph, for, say, Tamil, would always be at position XYZ relative to the
    start of the list. This would mean that substitution tables for rendering
    from a Unicode sequence to a displayable glyph could become portable rather
    than font specific, so there might, in time, be a great saving of duplicated
    effort in having such a numbered list of ligature glyphs.

    I emphasise that I am not in any way suggesting using Private Use Area codes
    for (italics) interchange (/italics) of text in these languages, I am simply
    suggesting that there seems to be the possibility that the process of
    producing fonts and other software systems for the carrying out of the task
    of glyph substitution for particular Unicode sequences could be made a more
    portable process if such a list were to exist.

    Is there interest in such a list of ligature characters in a numbered list
    being produced? As I say, I am not a linguist so I could not carry out the
    task, yet perhaps the task might be fairly straightforward, though
    necessarily taking a substantial amount of effort, for some of the readers
    of this mailing list, if there is interest in such a list being produced.
    Once done, the list would have long term usefulness. Spaces for the
    numbering could perhaps be allocated in the same order as the various
    languages of the Indian subcontinent are encoded within the Unicode
    Standard. Clearly expert guidance is needed as to how many ligatures exist
    for any particular language.

    The list would also be a useful index for glyphs in a "glyph library" of

    I was interested to read in a recent thread in this forum of the founding of
    the International Font Technology Association (IFTA) and wonder whether that
    organization would be an appropriate body to produce such a list, if there
    should be interest in the production of such a list.

    I would be pleased to know the views of people within this group as to
    whether such a list would be of advantage to typographers and others
    involved in computerized typography.
    William Overington

    3 March 2003

    This archive was generated by hypermail 2.1.5 : Mon Mar 03 2003 - 18:10:35 EST