Displaying the languages of the Indian subcontinent. (derives from Re: Please see my latest proposal)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Mon Mar 03 2003 - 17:11:30 EST

Next message: David Oftedal: "Need program to convert UTF-8 -> Hex sequences"

Previous message: John Cowan: "Re: (no subject)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Michael Everson wrote as follows.

quote

Andy,

Your BENGALI LETTER OPEN O can be encoded already with the sequence
U+0985 U+09CD U+09AF.

Your BENGALI LETTER CENTRAL E can be encoded already with the
sequence U+098F U+09CD U+09AF.

There is no need to "bring the Bengali code block in line with the
Devanagari block".

end quote

Firstly, I mention that I am not a linguist and do not write to make a
linguistic comment at all.

As some readers of this mailing list may know, I am very interested in
interactive television, in particular the DVB-MHP (Digital Video
Broadcasting - Multimedia Home Platform) system, which uses Unicode.

Now, from the specification for the DVB-MHP system, which can be downloaded
from the http://www.mhp.org website, it appears that fonts for the DVB-MHP
system, which can be broadcast, are to be in the PFR0 system, Portable Font
Resource version 0. I have some time ago obtained some details of that
system and looked through them, but did not follow all of the details, yet,
as the system seemed to date from the early 1990s it seems entirely possible
that the PFR0 system does not support the mechanism which allows a font to
substitute a particular glyph for a sequence such as the U+0985 U+09CD
U+09AF which Michael mentioned in his reply to Andy, quoted above.

It would therefore seem that the DVB-MHP interactive television system,
which is a system for worldwide use, may come up against considerable
rendering problems when it comes to making broadcasts using the languages of
the Indian subcontinent. I am seeking to resolve that problem by devising
an infrastructural tool to program round the problem by preprocessing
received Unicode text in the television receiver before it is passed to the
font, so that facilities for quality typography for the languages of the
Indian subcontinent exist with the DVB-MHP platform.

Is this a problem particular just to interactive television or is it a wider
problem?

I made a suggestion for a eutocode typography file in the following web
page.

http://www.users.globalnet.co.uk/~ngo/ast03300.htm

Now whether that use of some of the code points of the Private Use Area by a
user community were used in some scenarios (for example with PFR0 fonts in
interactive broadcasting) or whether the glyphs would be numbered in some
other sequence of numbering within a font, I am putting forward for
discussion the question as to whether it might be useful for there to be
produced a list of ligatures for the languages of the Indian subcontinent
such that each ligature has an index number in an ordered sequence from 1
upwards, so that those code numbers can be a standard way of accessing
glyphs within fonts or within systems such as a eutocode typography file.
It may be that any particular application of such a list would add an offset
constant to the list number during processing, for example hexadecimal EC00
for a eutocode typography file, or maybe 500 for an advanced format font,
yet the idea would be that some particular glyph for a particular ligature
glyph, for, say, Tamil, would always be at position XYZ relative to the
start of the list. This would mean that substitution tables for rendering
from a Unicode sequence to a displayable glyph could become portable rather
than font specific, so there might, in time, be a great saving of duplicated
effort in having such a numbered list of ligature glyphs.

I emphasise that I am not in any way suggesting using Private Use Area codes
for (italics) interchange (/italics) of text in these languages, I am simply
suggesting that there seems to be the possibility that the process of
producing fonts and other software systems for the carrying out of the task
of glyph substitution for particular Unicode sequences could be made a more
portable process if such a list were to exist.

Is there interest in such a list of ligature characters in a numbered list
being produced? As I say, I am not a linguist so I could not carry out the
task, yet perhaps the task might be fairly straightforward, though
necessarily taking a substantial amount of effort, for some of the readers
of this mailing list, if there is interest in such a list being produced.
Once done, the list would have long term usefulness. Spaces for the
numbering could perhaps be allocated in the same order as the various
languages of the Indian subcontinent are encoded within the Unicode
Standard. Clearly expert guidance is needed as to how many ligatures exist
for any particular language.

The list would also be a useful index for glyphs in a "glyph library" of
designs.

I was interested to read in a recent thread in this forum of the founding of
the International Font Technology Association (IFTA) and wonder whether that
organization would be an appropriate body to produce such a list, if there
should be interest in the production of such a list.

I would be pleased to know the views of people within this group as to
whether such a list would be of advantage to typographers and others
involved in computerized typography.
.
William Overington

3 March 2003

Next message: David Oftedal: "Need program to convert UTF-8 -> Hex sequences"
Previous message: John Cowan: "Re: (no subject)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Mar 03 2003 - 18:10:35 EST