Re: Re: Continue: Glaring mistake in the code list for South Asian Script//Reply to Kent Karlsson

From: Naena Guru <>
Date: Thu, 3 Nov 2011 20:34:44 -0500

I have not read the entire thread of this conversation. It looks as if the
debate has reached a level of acrimony. We need to inspect the background
to this entire controversy to come to a rational understanding.

In the nineties, there was a tug-o-war between ISO and Unicode on how to
digitize languages like Indic. The ISO-8859 committee wanted to share the
code points from u0080 to u00FF or the Lain-1 Supplement among the Western
Europeans and anyone else that wanted to define their codes. ISO-8859-12
(Devanagari) was abandoned because Indians could not come to a consensus.
As a result, the rest of Indian languages including Tamil and Singhala were
not even taken up for consideration.

Unicode had a different idea. The driving principle of Unicode was the
Plain Text idea. “Every letter of every language on earth would have its
own proud codepoint uniquely its own. [Imagine a time capsule when we go
down in flames. Unicode will be there to tell about the great human

During a meeting with Europeans held in America, Unicode’s idea won.
(Juicier version of this story was on Unicode web site that is since
removed). Now, Unicode wanted the help of ISO to get to the authorities of
those countries where government bureaucracies, not businesses mattered.

What motivations did Americans, Europeans and South Asians have about this
entire endeavor? Americans wanted to expand business to the outside world
(the good word is Globalizing). The Europeans had their own domineering
commercial control over the former colonies, yet with a sentimental

In South Asia, where literacy rate was very low (except Sri Lanka), only a
minority of English educated elite had any exposure to computers. The Indic
region was still struggling working with the typewriter. There was no
public awareness in what was happening in America called Unicode. If at
all, it was some piece of news that some esoteric thing was going on where
Americans with their great scientific prowess were going to give them
something grand, a white elephant, perhaps. At least in Sri Lanka, the
motivation was the World Bank’s offer of loans. They got a European who
cannot read Sinhala to sign for the standard. Other than the promise of
money and jobs, they had no idea where they were going. If you know how
Third World governments work, you know what I mean.

With twelve years of making the Arial font etc. we arrived at the
conclusion that one default font with all letters of the world is not
practical, and kind of ridiculous. We now have SBCS, DBCS etc. , to spread
several scripts, and at least in Sri Lanka, bureaucrats as surrogates to
give excuses and reassurances to the public. (Computers have gone 16-bit,
Only standard is Unicode, others are hack jobs). They are contemplating to
make laws to force Unicode Sinhala on people!

All mature software are written for SBCS. There is no incentive for those
whose programs work so well to hazard re-writing their programs to
accommodate DBCS. Compiler makers are encouraging programmers to globalize
and write their programs for Unicode, but who is going to pay to overhaul
30 to 40 year programs that have stabilized and reached near perfection?
Besides, software piracy was the order of the day outside America.

Presently, Indian and Lankan general public has arrived at a point when
they are able to use their languages on the computer. People want to
communicate using their computers in their native languages. Unicode Indic
is very hard to use. It is nothing like English. It requires so many new
things like word processing programs, physical keyboards and such. Unicode
came way before they were ready, and they have now become victims. It is
long past the Korean debacle. Now Unicode is set in concrete.

The problem is that the Unicode scheme has divided the world and
categorized scripts as computer friendly and barbarian.

I do not know about CJKV, but Indic would have been much better off had
they made their standards within SBCS. I tested this for Sinhala and it is
a great success. See it at the following link (Please do not use Internet
Explorer because it does not support full-font downloading or Open Type
font rendering):

Sinhala is one of, if not *the* most complex of Indic and has two major
orthographies, Mixed Sinhala and Pali. I studied the division of vyaakarana
(grammar) of Sinhala / Sanskrit writing and made a comprehensive
transliteration on to ISO-8859-1 with no loss. And then I made a smartfont
to dress the Latin encodings in the native script. People use this system
unknowing that the underlying code is ‘English’. It is a refinement of
Anglicizing. You type the way you speak and the orthographic font shows it
magically in its full complexity. It has been in existence since 2005
despite the government bureaucrats disparaging it.

Sinhala includes Sanskrit at the core of its phoneme chart or Hodiya. It
also covers Pali. You use this system just like English – backspace erases
last character, search and replace etc. Such fundamental tasks are awkward
and do the unexpected in Unicode Sinhala. Whereas, you transliterate, and
you are magically transported to the wonder land of English and the Western
European languages.

Sri Lanka took the advice of the World Bank to take an initial loan of $50
million to get people to transact business over the Internet with the
government agencies -- a ridiculous idea for a poor population living in a
300 by 150 mile island.. They established a special agency called ICTA to
implement these, and loans continue to come. They cannot make a single web
site that is standards compliant. Unicode Sinhala violates the Unicode
standard it belongs to! You cannot do simple word processing tasks we take
for granted with English, no way to sort. Yet it has created thousands of
jobs and lot of debt.

My only wish, but only a dream, is that the names of fonts be standardized
to have a prefix indicating the script and the encoding method. E.g.
sing-u-[name] – meaning this is a Sinhala font with Unicode code page
sing-t-[name] – meaning this is a transliteration to SBCS of Sinhala to

Thank you.


On Thu, Nov 3, 2011 at 1:54 AM, delex r <> wrote:

> ----- Forwarded Message -----
> From: delex r <>
> To: Christopher Fynn <>
> Cc:
> Sent: Fri, 28 Oct 2011 13:49:24 +0530 (IST)
> Subject: Re: Continue: Glaring mistake in the code list for South Asian
> Script//Reply to Kent Karlsson
> ----- Original Message -----
> From: Christopher Fynn <>
> To: delex r <>, Unicode List <>
> Sent: Sun, 23 Oct 2011 01:33:29 +0530 (IST)
> Subject: Re: Continue: Glaring mistake in the code list for South Asian
> Script//Reply to Kent Karlsson
> Delex
> Nobody's saying Unicode is perfect, but it works.
> Please realize that whatever "mistakes" you find in the standard,
> Unicode is not going to change the way it has encoded Indic scripts,
> the names it has given these scripts / writing systems, or the names
> of individual characters. A Character Encoding Standard would hardly
> be a useable standard if these things changed over time.
> The time to have suggested things be done differently, or that
> different names be used, was many years ago when the Indic scripts
> were first being included in the UCS. Why did no authority from India
> complain at the time?
> If you have real problems with the way Unicode has encoded the
> characters in Indic scripts, and you think it can be done better, you
> are of course welcome to create your own character encoding where e.g.
> each of the letters in all of the 1652+ mother tongues of India is
> encoded separately and then try and get people to adopt your "better"
> system as standard.
> Good luck to you.
> - C
> Dear Fynn , your this query
> >Why did no authority from India complain at the time?
> Can definitely be answered if you/or Unicode provide some cognizable
> information on actually what authority or department from India/Bangladesh
> suggested that the script and the entire set of letters be named as BENGALI
> rejecting even the need of giving a common name acceptable to both the
> societies disclaiming pseudo ownership.They showed their indifferent
> attitude towards factual genesis of the script. If it was from India then I
> may be able to answer your query about what went wrong here in india during
> 1990s, after and before.
> >you are of course welcome to create your own character encoding where
> >e.g.each of the letters in all of the 1652+ mother tongues of India is
> >encoded separately........
> I can count that even Unicode has not or needed not to encode too many of
> them as well probably because of the fact ( you better know) few of them
> actually can boast of having their own script.You may realise(if you wish
> to) what harm Unicode is doing by publising Assamese as Scriptless language
> like English,French,Bodo etc which are using borrowed script.
> >and you think it can be done better.......then try and get people to
> >adopt your "better"system as standard.
> I dont think I am going to tell about my "better" standard in this public
> mail list and become the victim of the plagiarists. There is always a "
> better" thing to happen or evolve. See even Einsten is being questioned now
> a days in mankind's everlasting search of faster and truer things!
> Regards
> DR
Received on Thu Nov 03 2011 - 20:41:16 CDT

This archive was generated by hypermail 2.2.0 : Thu Nov 03 2011 - 20:41:18 CDT