From: James Kass (email@example.com)
Date: Sun Apr 02 2006 - 02:00:44 CST
Kent Karlsson wrote,
>> Reproduce Table 9-11 on page 248 of TUS4.0 in plain text. The table
>> illustrates Malayalam Orthographic Reform.
> Note the table heading, which says *ORTHOGRAPHIC* (spelling) reform.
Spelling rules are only a subset of orthography.
> What is not said is how the difference in orthography is encoded in
> the character stream.
It's not. By design. By the script's main users. To the best of my
> Since it is an orthographic reform, there must
> be some difference in the character stream. One plausible way is to
> use ZWJ/ZWNJ to mark the spelling difference.
If Table 9-11 were reproduced in plain text using ZWJ/ZWNJ, it
should display fine... as long as a font supporting the traditional
orthography was used. The chart could not be displayed using a
font supporting the reformed orthography because such a font
would not include the ligatures needed for the traditional column.
(An OpenType font supporting the reformed orthography could
probably be made to include ligature glyphs referenced with
ZWJ look-ups. Some font developer in the user community would
have to consider the effort worthwhile, though. So, until then...)
> (Ideally, IMO there
> should have been OLD U/NEW U, OLD UU/NEW UU characters, rather
> than overloading U and UU with both old and new orthography.)
From a plain text computer encoding viewpoint, you may be right...
> This has NOTHING to do with font selection. Not at all! (Besides: that
> figure does not include AU.)
... but the user community insists that the same encoded binary
strings be displayed in either traditional or reformed style based
upon the user's font choice. The disadvantage of not being able
to display both forms of the script in plain text may have been
outweighed by the advantages of not having to transcode, not having
to maintain two sets of all web pages on a web site, easier to
implement searching/sorting/so forth, libraries not having
to maintain doubled databases, etc.
> When a new orthography was announced for German a few years ago,
> did you go and make two Latin fonts then, one for the old and one for
> the new orthography? I guess (and hope) not... When one for Finnish
> started to use ? and ? instead of sh and zh, did you go and make a
> font that displays sh as ? and zh as ?? I guess and hope not.
Of course not. I've always figured that if anybody wants to
represent the "sh" sound with a question mark, they should just
use the question mark character at U+0037.
(My browser settings munged your message.)
> How did you (and some others) manage to miss the rather clear
> statements (in several places) that 0D4C is a **TWO-PART** vowel??
Explanatory text about U+0D4C specifically should be added to the
standard. Since the standard currently offers no direction with
respect to U+0D4C and the orthographic reform, people speculate
and form divergent opinions as to proper implementation methods.
The user community apparently considers that U+0D4C is only a
two-part vowel sign in the traditional orthography. It is a one
part vowel sign in the reformed orthography. Try thinking of
it as a unification.
>> Quoting from:
>>> Its the responsibility of the unisribe to put the AU marker. font is
> not doing
>>> anything to put symbols on both sides, itd automatically done by
>>> let me see if i can check that behaviour of uniscribe.
> I'm not sure what this tries to say.
The font developer is responding to a bug report in which the user
does not find the expected behavior of the left side of U+0D4C getting
dropped. The font developer (correctly) identifies the problem as
being caused by the rendering engine rather than the font and offers
to look into the rendering engine's behavior.
>>> Also, it can potentially violate Uniqueness Rule when people
> "Uniqueness Rule"???
"Two different encodings should not render same,
irrespective of the font or joiners used."
>> The user community, far as I can tell, shuns the notion that U+0D4C
>> U+0D57 are equivalent.
> They are NOT equivalent.
Good! They shouldn't be. The text Kenneth Whistler submitted from
5.0 could be construed to suggest that they will become equivalent in
Unicode 5.0, though. That's why I asked and what started this thread.
> They are DIFFERENT spellings of AU in Malayalam.
The Malayalam user community is better qualified to judge this than
you or I.
"Thoolika2005 have both Reformed Malayalam and
Traditional Malayalam Open Type Unicode fonts. In
Unicode the code points for Traditional Malayalam
script and Reformed Malayalam script is same. So,
the changing of script from Traditional to Reformed
and vice-versa can be achieve simply by selecting the
The government of India, in a special report on Indic scripts and
Unicode (relevant section: http://tdil.mit.gov.in/Malya-guj.pdf )
says right in their own version of the Malayalam code chart,
"0D57 ... MALAYALAM VOWEL SIGN AU LENGTH MARK
(new line, bullet) Not in modern use. (new line, bullet)
already given at 0D4C".
(In fairness please note that the tdil pages are a bit outdated
now and others have pointed out misconceptions in various
sections of those PDFs to this list and other lists.)
It's my impression than one of the reasons that the actual
users require a common encoding for either traditional or
reformed orthography text display is that, although the
script reform movement started some forty years ago,
not everyone has "bought into it" yet.
If it is Unicode's official position that traditional Malayalam
use U+0D4C and that reformed Malayalam must use U+0D57,
then Malayalam rendering engineers may recommend that
traditional Malayalam fonts be designed with traditional
AU glyphs at both code positions and reformed fonts with
reformed glyphs there. Then they'd lobby operating system
marketers to support their requirements while implementing
same in OpenSource...
Apologies for length
This archive was generated by hypermail 2.1.5 : Sun Apr 02 2006 - 02:12:24 CST