Re: Arial Unicode MS and Code2000

From: James Kass (jameskass@worldnet.att.net)
Date: Sat Jul 07 2001 - 11:01:20 EDT


Rajesh Chandrakar wrote:

> ...
> but I can say about the Devanagari which is my mother
> language. Some "matras" such as "EE" and "U" of this language
> is not being represented in actual place, somewhere it is
> coming first and somewhere after than its actual place in
> words. Look at the first line itself just down the "Pallavi:"
> given in bracket).

There are three possibilities that come to mind for the cause
of the problem you are reporting.

This could be a font problem, an encoding problem on the
page in question, or a problem with the operating system
support.

Here is the line in question reproduced in Unicode (UTF-8):

वेंकटाचलपते (निन्‍नु नम्‍मिति वेगमे नन्‍नु)

Far as I can tell, the encoding is correct. Here's a link to
a graphic, the top of the graphic is this Devanagari line
of text as it appears on the system here, and the bottom
part has been touched up to show (more-or-less) how the
text appears in the book "South Indian Music Book III".
http://home.att.net/~jameskass/dev003.gif

The substitution of the half-letter forms is performed
by the operating system based on information contained
within the font. In this case I'd say that the problem is
probably with the font, the operating system here should
be handling this properly.

Note that the matras are supposed to be re-ordered by
the operating system (where appropriate), and here this
seems to happening as expected, except for one place
where a half-letter form substitution is supposed to
take place. Once again, this is probably a problem with
the font, if the half-letter form substitution were
happening correctly, the matra should be re-ordered
to appear before the half-letter form.

Example:

in Devanagari text a string appears as
"matra" + "half-letter form" + "consonant".

When encoding in Unicode, it should be entered as
"consonant which will be made into a half-letter form" +
"virama (which is supposed to make the previous letter either
a half-letter form or part of a conjunct form)" + "consonant"
+ "matra (vowel sign)".

The operating system is supposed to re-order certain matras
so they appear at the beginning of the syllable. As you know,
not all matras need to be re-ordered.

If the Private Use Area is used, the author might have to resort
to many different levels of "work-arounds" in order to get the
display to appear as expected on different operating systems.

For example, if the text is entered "visually", it might be
necessary to include some of the special Unicode characters like
the ZWS or ZWNJ (zero-width space/non-joiner) just to prevent
Unicode-aware applications from re-ordering a matra which the
author has already re-ordered. And, on newer systems, if a
matra is not appearing in the text string where expected, it
will be displayed with a combining broken circle as default
behavior. This can be handy for spotting typos, but annoying
for anyone wishing to construct a page "visually".

This is all fairly complicated, and I'm afraid that I'm not doing
a very good job of explaining this.

People who would like a better, more complete explanation of how
this all works should read the Unicode Standard 3.x Chapter 9,
"South and Southeast Asian Scripts", particularly the first part
which covers Devanagari. Also, information about the OpenType format
and the Uniscribe can be found in the excellent article by John
Hudson at:
(all on one line...)
http://www.microsoft.com/typography/developers/opentype/default.htm

If the Unicode Standard 3.0 information is not available, try finding
on-line specifications for the ISCII standard, because Indic Unicode
is based upon ISCII.

Best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Sat Jul 07 2001 - 12:00:24 EDT