Re: Compiling a list of Semitic transliteration characters from Naena Guru on 2012-09-07 (Unicode Mail List Archive)

From: Naena Guru <naenaguru_at_gmail.com>
Date: Fri, 7 Sep 2012 11:43:59 -0500

Transliteration or Romanizing

My first advice is not to embark on making solutions for languages that you
do not know. Unicode ruined Indic and Singhala by making 'solutions' for
them by not doing any meaningful research and ignoring well-known Sanskrit
grammar and previous solutions for Indic.

I romanized Singhala, probably the most complex script among all Indic, and
made an orthographic font that in turn shows the transliterated text in its
native script.
http://www.lovatasinhala.com

Some reasons for romanizing:
1. The current solution is hard to use and incomplete
2. A user friendly method on the computer for native users for their
language
3. Make the language accessible to those who are not familiar with the
script
4. Help in linguistic studies, take advantage of text to voice technologies
etc.

In order to romanize successfully, you need to select the best character
set. Unicode is a very bad choice because its codes are at least two bytes.
Do not be fooled by statements like, 'use added bonus characters', which
lures you to the crippled double-byte area, Then some might try to scare
you by saying you are making rogue encodings. The only constructive
suggestion I had was to be aware of Latin semantics, which too was trivial

In my opinion, the best character set is ISO-8859-1, which was modified by
Microsoft to Windows-1252. The following table shows both of them merged.
The area with yellow background is the part that was modified by Microsoft.
ISO-8859-1 had machine control commands in that area.
http://www.lovatasinhala.com/eds/charsets.htm

Notice that capital and simples are separately encoded. Capitals can be
used to indicate modified or closely related sounds to the ones used on
regular keys. The AltGR or Ctrl+Alt shifted state gives you more options.
Remember that here are letters that are not seen often such as þ, ð, æ,
etc. The key positioning in relation to English sounds is more important
than the fear of their unfamiliarity. I used þ and ð in the regular
positions of t and d. and moved t and d to AltGr shifted positions. The
users hardly notice it because t and d keys are what they used in
Anglisizing anyway that now give more accurate interpretation. By the way,
starting with Anglicizing and refining it is what would be easiest for the
users to adapt.

The keyboards that have that have ISO-8859-1 characters are
US-International, US-Extended, Dead-key Keyboard in Windows, Macintosh and
Linux systems. All these three OSs also have easy ways to add your own
customized keyboard. In my experience, it is best to strip off all keys not
used by a transliteration scheme from the customized keyboards to avoid
typos. Think if ZWJ and / or ZWNJ and non-breaking space might be useful in
the new keyboard.

As for directionality, I'd keep it L2R and if you make fonts like I did,
the transformation might incorporate the direction switch.

Good luck!
Received on Fri Sep 07 2012 - 11:48:56 CDT

This archive was generated by hypermail 2.2.0 : Fri Sep 07 2012 - 11:48:58 CDT