Re: Offlist: complex rendering

From: Naena Guru <naenaguru_at_gmail.com>
Date: Mon, 18 Jun 2012 17:50:46 -0500

I see where this is going -- mischaracterization. I am used to it. I did
not get into debates with Lankan technocrats because I saw how they treated
their questioners. I can cite many instances right off the Internet. I saw
somewhere someone saying I am complaining against operating systems too.
Get another one if this is not good. That is anyone's attitude.*

Transliteration does not compete with Unicode Sinhala. It complements
it*by providing ways to do things it cannot ever hope to do or very
awkward to
do, such as collation, search, replace etc. Checking the logs, I see a
steady number of people using http://www.lovatasinhala.com/liyanna.php to
write their text in romanized Singhala to get Unicode Sinhala -- heartening
indeed.

ICTA, the official agency for IT in Sri Lanka, has 40 plus programmers
(learn, you Americans) and I am just one person devoting all my time and
funding myself down to poverty and wasting time here to reply those who
have formed such hatred for challenging what they revere.(rational and
scientific?) ICTA spend $ tens of thousands to get people from abroad to
teach how to make fonts! Thank you World Bank. Sorry, poor Lankans.

I have to rehash everything I said earlier over and over again. *This is
only a matter of transliteration*. Give me time to make the official
proposal, XML and all. The font issue is not for Unicode. Unicode says that
it is all about codes and not shapes. It gave two examples, Fraktur and
Gaelic as scripts allowed to reside on Latin-1 but have shapes not expected
of Latin-1. That makes me wonder if Singhala is frowned upon because it is
not European. There is no other excuse because English was one time
romanized from fuþorc.

For anything to have a hope of success, you have to be aware of the
standards. I did my transliteration and subsequent proof-of-concept font
fully aware of this. My motivation to make the font, of course, was the
pathetic state of Unicode Sinhala. People need something that is practical.

Easier and rational path for Singhala (and Indic) is transliteration. For
instance, Harvard-Kyoto is a very successful transliteration for Sanskrit
that goes to ASCII. It is easy for everyone to learn *and type*. ISO-8859
committees attempted something like this. When complaints mounted, in 2008
Unicode accepted the principle of transliteration.

*UNICODE SINHALA -- Bad solution:*
The approach Unicode has taken is one that looks at letters in the same way
English does, just symbols. That is easy. Let's take the letters and assign
them codes. Signs for vowels? Let's give them their own codes as well.
Conjoint letters? Let them be defined in the font or by keyboard. Looks to
me more like obfuscation than solution. Andy Daniels and Michael Everson
tried to help by proposing the code page for Singhala. It was taken by
Lankans just the way it was proposed. When Michael wanted to add numerals
(Andy had them first), Lankans did not know there were any. The currently
published books have them. That shows the level of interest.

In Unicode Sinhala, there are two conjoint letters. Unicode gave an
ultimatum for specifying normalization of canonical letters. That day is
long passed. Now Unicode Sinhala is not compatible with Sanskrit and Pali.
Another 'Korean debacle' won't happen and not necessary. The
transliteration works! It round-trip converts between itself and Unicode
Sinhala. (Unicode guys, relax). I wrote the program even fixing a
purposeful error ruled by SLS1134. (A drunkard's solution: How to write
'brandy' in Sinhala).

*An approach with vision for preservation of the language:*
Plainly, I have a notably successful transliteration for Sanskrit, Pali and
Singhala. A transliteration is Unicode compliant if it uses Unicode code
pages, right? Mine is a transliteration into the SBCS. (ASCII+Latin-1.
ISO-8859-1). It makes the text look like a text of a normal language than
HK. It looks like Icelandic or Old English. It could be typed using
Dead-key extended QWERTY key layouts. It is perfect for text to artificial
voice. Read more to understand.

Sanskrit *vyAkaraNa *(I used HK Transcription), that Indic languages
together share, starts with *akSaravinyAsa*. An *akSara *has two sides to
it. The sound and the shape (*vArNa *and *rUpa*). I studied A. M.
Gunasekera's 'A comprehensive Grammar of the Sinhalese Language' and Rev.
Theodore Perera's 'The Sinhala Language' plus another recent one before
deciding that I had it all correct. Also, I use Monier-Williams to verify
and research Sanskrit -- now online since a few years:
http://www.sanskrit-lexicon.uni-koeln.de/scans/MWScan/tamil/index.html

*Orthographic Smartfont:*
The font is an Open Type font that complies with features prescribed for
'Simple' scripts. I was a member of the now closed Volt group of Microsoft
and followed through the development of Uniscribe. At the time I made the
font, only WorldPad by SIL.org supported it. People that I showed it
preferred it for Unicode Singhala because input was through
US-International and shapes were automatic.

There is a very important function of the font other than the visual help
for typing and reading. It also upholds and preserves dying orthographies.
It should be developed into a true orthographic font. (Help and time?).

Singhala has three orthographies. Sinhala words do not have joined letters.
Sanskrit and Pali have a set of double and treble conjoint letters. In
addition, Pali eliminates the hal sign within the words and allows only the
halant. Pali needs its own fonts.

Below are links to two files that show the first paragraph of this web page:
http://www.divaina.com/2012/06/17/scholast.html
Unicode Sinhala:
http://ahangama.com/sing/DBS.htm <http://ahangama.com/sing/DSS.htm> (4 kB)
Romanized Singhala:
http://ahangama.com/sing/DSS.htm (1 kB)

Compare the shape formation and the sizes of the files. How much bandwidth
is taken for the Unicode Sinhala file to go as UFT-8? 6kB! Six times the
romanized file. Beyond that, imagine how that Unicode Singhala page was
made from scratch and how many more steps were needed to get there than
Latin-1 page. If you closely inspect the original page from Divaina, you
see that they did not input Unicode Sinhala directly but used an
intermediary step. (There are two stray English letters). These are things
that matter for ordinary citizens, not university dons paid and venerated
by those same poor citizens.

I have only limited time. So please do not expect me to reply to the entire
barrage of attack. (I think I am already banned as a spammer for repeating,
or not to hear the unpleasant truth?).

Thank you.

PS:
I added Donald Gaminitilake to the list as he had a third solution for
SInghala and was one I watched pummeled by the crowd in Lanka.

On Mon, Jun 18, 2012 at 2:09 AM, Ruvan Weerasinghe <arw_at_ucsc.cmb.ac.lk>wrote:

> Not that it is of much consequence, but even the website cited for
> 'Anglicizing' Sinhala (http://www.sinhalaya.com/) now has a majority of
> content in UNICODE! I couldn't really find posts using images in this site
> - not that I tried hard to find them!
>
> 5 years ago, this argument may have had some attention by those not
> conversant with UNICODE, but today, with the multitude of blogs, wiki's
> (Sinhala wikipedia has 7k entries now in case you didn't know), google
> search in Sinhala, only the mischievous could insist we need to go back.
>
> Incidentally, please visit the following link to see the hit counts for
> common Sinhala words in a google search in August/September 2009. Clicking
> any of the words will give you a google hit count for that word today. This
> should give a good idea of the spread of UNICODE Sinhala:
>
> http://ucsc.cmb.ac.lk/wiki/index.php/LTRL:Web_Statistics_(Online_Sinhala)/August_September
>
>
> (e.g. the word අනුරාධපුරය which had 2500 - 4000 hits that time, has over
> 32,000 today; the word ඉදිරිපත් which had 60k+ hits then has over 600k hits
> now)
>
> Regards.
>
>
> Ruvan Weerasinghe
> University of Colombo School of Computing
> Colombo 00700,
> Sri Lanka.
>
> Web: http://www.ucsc.lk
> Phone: +94112158953; Fax: +94112587239
>
> ------------------------------
>
> *From: *"Harshula" <harshula_at_gmail.com>
> *To: *"Naena Guru" <naenaguru_at_gmail.com>, "jc" <ahangama_at_gmail.com>
> *Cc: *"Tom Gewecke" <tom_at_bluesky.org>, unicode_at_unicode.org, "Tissa
> Dharmagunaratne" <tissaus_at_aol.com>, "Ranjith Ruberu" <
> antonyruberu_at_hotmail.com>, "Kusum Perera" <kusum510_at_gmail.com>, "Bertie
> Fernando" <bertiefernando_at_hotmail.com>, "Ruvan Weerasinghe" <
> arw_at_ucsc.cmb.ac.lk>, "Gihan Dias" <gihan_at_cse.mrt.ac.lk>, "Wasantha
> Deshapriya" <wasantha_at_icta.lk>
> *Sent: *Monday, June 18, 2012 10:07:14 AM
> *Subject: *Re: Offlist: complex rendering
>
>
> Hi JC,
>
> You have been making the same allegations for more than half a decade.
> Now you have moved on to a new forum, the Unicode Consortium. The
> reality is that all the professionals and academics that work in
> computational linguistics, Sinhala localization, etc in Sri Lanka are
> on-board with Unicode Sinhala. We are seeing research and applications
> developed on top of Unicode Sinhala.
>
> IIRC, back then you were unable to demonstrate the shortcomings of
> Unicode Sinhala that your scheme solved. If you have complaints about
> operating systems that do not implement Unicode Sinhala correctly,
> please contact the specific company.
>
> cya,
> #
>
> On Fri, 2012-06-15 at 01:45 -0500, Naena Guru wrote:
> > Tom,
> >
> > Thank you for taking an interest in this matter.
> >
> > You said,
> >
> > Mapping multiple scripts to Latin-1 codepoints is contrary to the most
> > basic principles of Unicode and represents a backwards technology leap
> > of 20 years or more.
> >
> >
> > Well, do you otherwise agree that the transliteration is good? It can
> > be typed easily, and certainly not like the Unicode Indic
> > transliteration that is only good for Aliens to discover some day.
> >
> > Unicode has a principle about shapes assigned to characters. It is the
> > opposite of what you said. At the time I started this project Unicode
> > version 2 specifically said that it does not define shapes. That is
> > the reason I tried it.
> >
> > Think of it as a help for the person that types. I tested it on real
> > people. They are unaware that the underlying codes are that of Latin.
> > They are surprised and elated.
> >
> > So, if you are so averse to changing the shapes of Latin-1, what would
> > you say about Fraktur and Gaelic that the standard specifically said
> > are based on Latin-1 but have different shapes?
> >
> > You said,
> > It doesn't seem realistic to me that it could ever see acceptance, and
> > I'm a bit surprised that you continue to devote your talents to
> > promoting it. Is there some reason you consider it to be promising
> > nonetheless?
> >
> >
> > (Thank you for calling me talented. I am not).
> >
> > It depends on whose acceptance you are talking about. You'll
> > understand if you are a Singhalese, Tom. The leap 20 years back is
> > what we need. Unicode parked us in a cul de sac. BTW, I haven't even
> > started to promote it. I want the IT community to say this works, as
> > it really does.
> >
> > Think why people Anglicize in this very popular web site:
> > www.sinhalaya.com
> > There are many such. (try elakiri.com)
> >
> > You will see some Unicode Sinhala, but most posts are written using
> > hack fonts and made into graphics to post. The Lankan government is so
> > worried that they have launched a program to teach English to everyone
> > perhaps seeing the demise of Singhala due to digital creep. (Wisdom of
> > politicians!).
> >
> > Also look at the web site of the IT agency of the government:
> > http://www.icta.lk/
> > How much prominence did they give the language of the 70%?
> >
> > The bureaucrats are giving themselves medals. (See the pictures). They
> > are making laws forcing the government employees to use Unicode
> > Singhala, because they are reluctant. It's a Third World country. The
> > literacy rate is 90% plus, not a little India. But the people are
> > docile. They depend on the government to tell what todo. The
> > bureaucracy in return depends on the West to tell them what is right.
> > The technocrats call themselves යුනිකේත (Love UNI!)
> >
> > Yes, Tom, I do have a very good reason. I know it because I am a
> > Singhalese. It is *practical* and being accepted and commended by
> > everyone that I showed it to. If English, German, Spanish, Icelandic,
> > Danish etc. use Latin-1, and if Singhala *can* perfectly map to
> > Latin-1, why shouldn't it? That is called transliteration. Recall that
> > English fully romanized about year 600.
> >
> > Singhala is a minority language that is scheduled to be executed, and
> > Unicode is unwittingly the reason.
> >
> > Brahmi probably is Old Singhala. The oldest Brahmi was found in Shree
> > Langkaa (Sri Lanka) 2-3 centuries before it was seen in India. Some
> > say Singhalese founded the Mayans. (What a chauvinist!). So, let's
> > give it a boost before World Ends.
> >
> > I need the support of Unicode, which is like World Government for
> > Laangkans. This is what I want Unicode to judge:
> >
> > * Is the transliteration practical?
> > * Do I have a round trip conversion with precious Unicode
> > Sinhala?
> > Help us, Tom.
> >
> > This message is getting too long.I can list pros and cons of
> > Dual-script Singhala and Unicode Sinhala to convince any techie why we
> > should forget Unicode Sinhala.
> >
> > Let me end with a quote from SICP
> > http://mitpress.mit.edu/sicp/full-text/book/book.html
> > Educators, generals, dieticians, psychologists, and parents program.
> > Armies, students, and some societies are programmed. An assault on
> > large problems employs a succession of programs, most of which spring
> > into existence en route. These programs are rife with issues that
> > appear to be particular to the problem at hand. To appreciate
> > programming as an intellectual activity in its own right you must turn
> > to computer programming; you must read and write computer programs --
> > many of them. It doesn't matter much what the programs are about or
> > what applications they serve. What does matter is how well they
> > perform and how smoothly they fit with other programs in the creation
> > of still greater programs. The programmer must seek both perfection of
> > part and adequacy of collection.
> >
> >
> > Do we want to be programmed or be programmers? Is the collection
> > adequate?
> >
> > Best regards,
> >
> > JC
> >
> >
> > On Thu, Jun 14, 2012 at 8:08 AM, Tom Gewecke <tom_at_bluesky.org> wrote:
> > naenaguru wrote:
> >
> >
> > > Map sounds to QWERTY extended key layouts adding non-English
> > > letters ->
> > > Result: strict, rule based alphabet extending from ASCII to
> > > Latin-1 ->
> >
> >
> > Mapping multiple scripts to Latin-1 codepoints is contrary to
> > the most basic principles of Unicode and represents a
> > backwards technology leap of 20 years or more. It doesn't
> > seem realistic to me that it could ever see acceptance, and
> > I'm a bit surprised that you continue to devote your talents
> > to promoting it. Is there some reason you consider it to be
> > promising nonetheless?
> >
>
>
>
>
Received on Mon Jun 18 2012 - 17:58:46 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 18 2012 - 18:01:11 CDT