Re: Unicode and transliteration

From: Jaap Pranger (
Date: Tue Aug 24 1999 - 08:23:42 EDT

Mon, 23 Aug 1999, <>:

> My primary interests are on transliteration and unicode part of
> Malayalam. But the questions I have are more generic....

For the transliteration (but not Unicode!) part of Malayalam:

Anthony P. Stone, Project Leader, ISO/TC46/SC2/WG12 Working Group
on Transliteration of Indic scripts: "Thinking aloud on transliteration":

Scripts to be considered in WG12 are: Assamese, Bengali, Devanagari,
Gujarati, Gurmukhi, Kannada, Oriya, _Malayalam_, Sinhala, Telugu, Tamil.

To subscribe to the WG12 mailing list "conv-dev" send an email to
<> with the following command in the body of your
message: subscribe conv-dev <your_email_address>

> What is being remaining is, assigning code numbers in case of Unicode
> or assigning correpsonding Latin character sequence
> in case of transliteration. Both are reasonably trivial.

To my knowledge there is not a single 8-bit char set/font available
containing all of the diacritics (or precomposed chars with diacritics)
used in Roman transliteration of Indic languages, except may be for the
CSX+ (Computer Sanskrit eXtended) fonts that do have most of them,
but miss a few chars necessary for the transliteration of Urdu words
within Hindi. But the CSX+ charset has, probably for historical reasons,
a really ugly encoding.

Chris Fynn however makes available a Unicode sub-font for Windows,
"IndicTimes", that may fit the bill for Malayalam, I'm not sure.


So far for my 0,02 U+20AC (or is it: U+20AC 0,02?)

Please allow for some more text ...

Christopher's font may, or may not have/compose a <g-underbar>
for transliteration of Urdu words in Hindi, but I cannot use it because
I'am on a Mac, and waiting for things to come in MacOS 9, ...X?

If I am not mistaken, the situation with Hindi, one of the most widely
spoken languages of the world, is that it cannot really be exchanged
in text files between platforms consistently, _not even in Devanagari
to Roman transliteration_ today (not considering 7-bit systems).

I have been a lurker here for some two years, and I'm impressed by the
complexity of, and the huge effort put into getting Unicode implemented.
At first I was surprised when some of the people here defended modern,
8-bit 'legacy' solutions. But when Michael Everson said: "the rest of the
world CANNOT sit by and wait for the gods to give us Unicode", I realised
that was exactly what I'd been doing for a long time!(before discovering
textconverters ;-)

It's my (hopefully wrong) impression that it will take a few years more
before Windows and Mac users will be happily exchanging bilingual text in
Roman and Devanagari script.

Therefore I would hope indeed that it's "reasonably trivial" to put up a
sound and elegant 8-bit code set for Roman transliteration of Devanagari,
for MacOS as well as Windows, with a capacity for smooth migration to UCS,
_within_ that period of a few years.

For private use I did change some 20 chars in a MacRoman encoded font
several years ago; this custom font I can use quite conveniently for
English, French, Dutch and Devanagari transliteration.

Such a font I think could be very useful to many people if it had the
best possible encoding, and its equivalent for at least Windows. Also, it
should actually _work_ under Windows. And should have a transfer
encoding. But how to do this.

Re: off topic emails
I am fully aware of the fact that this may be just one more off topic
message, it is not about Unicode. But it's also not about Fontographer, not
about Operating Systems. It touches all, and I will definitely need some
good advice ...

Should I post questions to comp.fonts? comp.std.internat?

BTW, in my opinion it would be very nice if Apple could make an easy
interface for custom tables/mappings in TEC modules. There is a
stand alone text converter by Tomasz Kukielka (Cyclone) that uses the
TEC engine and tables, but it has no option for custom conversions.

Jaap P


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT