RE: Romanized Singhala got great reception in Sri Lanka

From: Marc Durdin <marc_at_keyman.com>
Date: Mon, 17 Mar 2014 02:52:19 +0000

Naena,

If you have an encoding which is easy to type, that can be replicated with Keyman, or any number of other input systems, for Unicode Singhala. Input is not tied to encoding. I would be happy to assist you, off-list, to develop an input method for Unicode Singhala that works according to your requirements.

However, if you have examples of Singhala which cannot be represented in Unicode, please do bring these to the attention of this list. But differences in input method are not really relevant.

Marc

From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Naena Guru
Sent: Monday, 17 March 2014 1:44 PM
To: Philippe Verdy
Cc: jc_at_ahangama.com; Unicode List
Subject: Re: Romanized Singhala got great reception in Sri Lanka

Philippe:

All you said about ISCII is probably right. So, it has given you guys a lot of pain. I did not do it nor followed it.

As for Japanese (and also for Indic) I have read the warnings in RFC 1815:
http://tools.ietf.org/rfc/rfc1815.txt

I am not creating a transcoding table as you say. I assume you think I take Unicode Sinhala to be a legitimate encoding for Singhala that I am mapping to SBCS for the love of SBCS. No. And I don't know what concepts I am mixing. I am trained in Computer Science, I have taught it at college level, and have done years of consulting work and written project proposals for a pretty good size one for the Federal Government too.

I believe that you need to understand the problem at hand to find a solution for it. You cannot make solutions for Indic not knowing Indic. Starting blindly with ISCII was a mistake. It is useless at least for Singhala.

========= STORY OF UNICODE SINHALA ==========
The first draft for the Sinhala chart was handwritten by Andy Daniels. He mentioned some doubts about some letters in it. He had a good instinct on that. It sat there people wondering from where he got his information. He said from Germany. Someone said that it came from a $300 book. I suspect that it is Rev. Fr. A.M.Gunasekara's book (1891).

Then came the Lion of Unicode Michael Everson (down in this thread). He was making fonts by the dozen and took Daniels' draft certified the letters side of it, not having a nicely printed set of the digits. This certificate was countersigned by a Mettavihari for users. I know Ven. Mettavihari. He is a Danish man that researched and put up the most comprehensive Tripitaka, the Buddhist canon. This irreproachable man denies that he endorsed the standard on behalf of the Singhalese saying obviously he is not Singhalese. (Actually, I think he is more Singhalese than me). Who signed as him, a forgery?

When the code chart came to Lanka, the closest to a computer that they knew was the IBM Selectric typewriter. When they did not do anything about it, the World Bank offered a $83 scheme to bring Lanka to the computer age all the way so the village fellow could communicate with the government online. They set up the IT agency ICTA and got the academics gathered there doing 'projects'. They even paid a fellow to come over and read the OpenType specification for them. I understand that the kingpin of the operations there is one person that studied in US.He is the adviser to the President, The top Colombo University and the ICTA itself. He is one consultant that does most projects.

When Everson wanted to add the digits apparently finding Fr. Gunasekara's book, the Lankans denied such existed. When he showed them, they said they are not necessary. Now this everybody's consultant announced at my presentation that they are going to add them.
============ END STORY OF UNICODE SINHALA ==========

BAD UNICODE SINHALA:
Unicode Singhala violates Singhala / Sanskrit grammar. Unicode Singhala is not compatible with Sanskrit, an integral part of the Singhala script. That also applies to Pali whose native script is Singhala. Unicode Sinhala further helps kill Singhala by making it very difficult to type and impossible to obtain the entire repertoire of letters and limiting the applications and OSs that it can be used in.

Typing Unicode Sinhala requires you to learn a key map that is entirely different from the familiar English keyboard, while losing some marks and signs too. There is a program called Helabasa by Keyman typing system that printers use to type it. There is a physical keyboard too. Then there is Google transliteration - very inadequate and another one by Colombo University found on a web page. These last two allow you to type phonetically but not entirely. The result is very few people type Unicode Singhala, only those that their job requires them to type Unicode Singhala.

PERFECT ROMANIZED SINGHALA
I did the same thing English and Western European languages did; very close. I mapped the well-known 58+2 Singhala-Sanskrit phonemes in the SBCS. The reason is because then Singhala gets to use all those applications perfected over decades that most here Westerners enjoy. That set covers all letters necessary for Singhala, Sanskrit and Pali, the three languages that use the Singhala script.

See it here displayed using the first orthographic smartfont:
http://lovatasinhala.com/

MORE READING:
Let's look at this as a lay person (whose interest is our ultimate goal) sees:

English was fully romanized from fuşark by about 600 AD. Romanizing is writing by using letters of the Latin alphabet plus many, many others added to it. All Europeans when they became fully Christianized / literate, they all adopted Latin letters and extended them as they pleased. This set has branched off as Latin script and Cyrillic script. Printing industry standardized the greater part of the alphabets.

Singhala has a well defined phoneme chart called hodiya. It is an extension of the Sanskrit hodiya. Rev. Fr. Theodore G. Perera's grammar book (1932) and Rev. Fr. A. M. Gunasekera's book (1891) that dug up sinking Singhala fully describe the writing system. Like most other languages, including English before printing arrived in England, it is written phonetically.

Singhala was romanized first in 1860s by Rhys Davids, called PTS scheme, to print Pali (Magadhi) in the Latin script. This requires letters with bars (macron) and dots not found in common fonts. This scheme is called PTS Pali. It is similar to IAST Sanskrit. It is impossible to type these on the regular keyboard.

I freshly romanized Singhala by mapping its phonemes to the SAME area 13 Western European languages mapped their alphabetic letters within the following Unicode code charts:
http://www.unicode.org/charts/PDF/U0000.pdf
http://www.unicode.org/charts/PDF/U0080.pdf

So, if that is "creating a transcoding table" all Europeans did it and I do it too.

On Sun, Mar 16, 2014 at 12:36 AM, Philippe Verdy <verdy_p_at_wanadoo.fr<mailto:verdy_p_at_wanadoo.fr>> wrote:
Don't you realize that what you are trying to create is completely out of topic of Unicode, as it is simply another new 8-bit encoding similar to what ISCII does for supporting multiple Indic scripts with a common encoding/transcoding table?

The ISCII standard has shown its limitations, it cannot be enough to support all scripts correctly and completely, it has lots of unsolved ambiguities for tricky cases or historic orthographies, or newer orthographies, that the UCS encoding better supports due to its larger character set and more precise character properties and algorithms.

You are in fact creating a transcoding table... Except that you are mixing the concepts; and the Unicode and ISO technical commitees working on the UCS don"t need to handle new 8-bit encodings. And you'll soon experiment the same problems as in ISCII and all other legacy 8-bit encodings: very poor INTEROPERABILITY due to version tracking or complax contextual rules...

You may still want to promote it at some government or education institution, in order to promote it as a national standard, except that there's little change it will ever happen when all countries in ISO have stopoed working on standardization of new 8-bit encodings (only a few ones are maintained; but these are the most complex ones used in China and Japan.

Well in fact only Japan now seens to be actively updating its legacy JIS standard; but only with the focus of converging it to use the UCS and solve ambiguities or solve some technical problems (e.g. with emojis used by mobile phone operators). Even China stopped updating its national standard by publishing a final mapping table to/from the full UCS (including for characters still not encoded in the UCS): this simplified the work because only one standard needs to be maintained instead of 2.

Note that as long there will not be any national standard supporting your proposed encodng, there is no chance that the font standards will adopt it. You may still want to register your encoding in the IANA registry, but you'll need to pass the RFC validation. And there are lots of technical details missing in your proposal so that it can work for supporting it with a standard mapping in fonts.

There is better chance for you to pomote it only as a transliteration scheme, or as an input method for leyboard layout (both are also not in the scope of the Unicode and ISO/ISC 10646 standards though, they could be in the scope of the CLDR project, which is not by itself a standard but just a repository of data, supported by a few standards)... Think about it.

2014-03-16 5:12 GMT+01:00 Naena Guru <naenaguru_at_gmail.com<mailto:naenaguru_at_gmail.com>>:
I made a presentation demonstrating Dual-script Singhala at National Science Foundation of Sri Lanka. Most of the attendees were government employees and media representatives; a few private citizens came too.

Dual-script Singhala means romanized Singhala that can be displayed either in the Latin script or in the Singhala script using an Orthographic Smart Font. It is easy to input (phonetically) using a keyboard layout slightly altered from QWERTY. The font uses Standard Ligature feature <liga> of OpenType / OpenFont standard to display glyphs of Sanskrit ligatures as well as many Singhala letters. The font is supported across all OSs: Windows, Macintosh, Linux, iOS and Android. Dual-script Singhala is the proper and complete solution on the computer for the Singhala script used to write Singhala, Sanskrit and Pali languages. The same solution can be applied for all Indic languages.

The government ministries, media and people welcomed it with enthusiasm and relief that there is something practical for Singhala. The response in the country was singularly positive, except for the person that filibustered the Q&A session of the presentation that spoke about the hard work done on Unicode Sinhala, clearly outside the subject matter of the presentation.

The result of the survey passed around was 100% as below (translated from Singhala):

  1. I believe that Dual-script Singhala is convenient to me as it is implemented similar to English - Yes
  2. Today everyone uses Unicode Sinhala. It is easy and has no problems - No
  3. The cost of Unicode Sinhala should be eliminated by switching to Dual-scrip Singhala - Yes
  4. We should amend Pali text in the Tripitaka according to rulings of SLS1134 - No
  5. Digitizing old books is a very important thing - Yes
  6. We should focus on making this easy-to-use Dual-script Singhala method a standard - Yes
Please comment or send questions.

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org<mailto:Unicode_at_unicode.org>
http://unicode.org/mailman/listinfo/unicode

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Sun Mar 16 2014 - 21:53:24 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 16 2014 - 21:53:25 CDT