Re: Subtitles in Indic/Arabic/other scripts requiring CTL

From: Naena Guru <naenaguru_at_gmail.com>
Date: Mon, 14 Nov 2011 14:05:20 -0600

Shriramana,

The question you raise relates to the problem of font rendering. According
to the Open Type standard, each script, i.e. Latin, Tamil etc. have their
own rules on how letters are constructed and displayed. For instance, when
you write 'ke' in Tamil, the kavanna is preceded by the kombu. That is a
'Tamil specific' rule. The same rule is applied separately to Davanagari,
Sinhala etc., because each language has its own code page.

All this makes each script to have its own set of font rendering rules. If
some application does not know Tamil font rendering rules, then you won't
get the result you want.

Getting back to the example I started, kombu is a sign not tied directly to
a vowel and Kavanna is a base letter. Why on earth did Kombu get a
codepoint equal in status to a letter? Initially, the Indic problem was
interpreted taking Latin alphabetic writing system as the basis. We simply
define the alphabet and string the letters (shapes) together to make the
text.

When they tried to encode Indic, they were completely thrown off the track
by the way vyaakarana books showed the hodiya in the native script. They
should have looked at transliteration schemes such as HK-Sanskrit.

Hodiya is the phoneme chart built around Sanskrit. In it, the hal letters
or consonants were displayed as shapes of ka, ga etc. Originally in Indic,
both the pure consonant and the letter with 'a' implicit had the same form.
The text in the vyaakarana book itself, if anyone cared to read, treated
hal as consonants and not as some mysterious thing that carried an 'a' that
had to be smoked out with a virama. Originally people did not use the
virama or halant (or pulli in Tamil) inside words. Halant and virama means
"this mark indicates that the last letter of the word is a consonant".

In Singhala, the Tamil word 'uppu' would be written u+p-pu, where p and pu
*touched* indicating that the p is a pure consonant. When you write megam,
the ending m would have a hal kiriimee lakuna (pulli, the virama). If it
did not have the pulli, then it would be megama. I am not sure how Tamil
dealt with this, but searching for 'Tamil palm leaf book', you can find
many pictures of old manuscripts that you could inspect to see how Tamil
had it.

In planning the Unicode code page, we look at Tamil and decide, okay, 'ke'
needs a kombu and kavanna. Let's then give a codepoint to kombu as well.
Now write k followed by e and we get kavanna followed by kombu -- wrong
order. Therefore, we write a special rule for Tamil saying that if you
follow k with e, replace k already on screen with e and k, in that
transposed order, and some more reasoning and complications get the result.

Unicode is a disaster for our languages. You cannot sort, backspace delete,
search and replace and so on as as we do with English -- no telephone books
at least for Singhala, my native language.

Every application knows or supposed to know Latin font rendering rules. You
test your program first with Latin, right? The latest rules for font making
are in the Open Type standard published in 2003 by Microsoft. Nearly all
new versions of common programs understand Latin script rules in the Open
Type standard except Internet Explorer.

The solution is to first transliterate Tamil into the Latin script and to
write a smartfont. Then nearly all recent version of programs will let you
show Tamil correctly, except Internet Explorer. Internet Explorer does not
understand the Open Type standard. (Microsoft wrote the standard).

People in the West and those who came to West do not know finer points
about our languages. It is our onus to get our solution, Sri Ramana, and it
is not hard.

I did this for Singhala. See the following web site. It is all romanized
Sinhala in the background and shown in the complex native script. Do not
use Internet Explorer. If you use Firefox, you may have to click on a
second page to see the native Singhala smartfont dress the Latin script.
This obviates the need for a separate code page for Singhala:
http://www.lovatasinhala.com/liyanna.php

On Fri, Nov 11, 2011 at 8:27 AM, Shriramana Sharma <samjnaa_at_gmail.com>wrote:

> My father (Dr T Vasudevan, cc-ed here) is working on a spoken (actually
> video) tutorial project using Indian languages as the instruction medium.
> He would like to add Indic language subtitles in (obviously) Indic scripts.
> For now, Tamil, both as the language he is working on (as it is our mother
> tongue) as well as the Indic script which is simplest in terms of CTL.
>
> However it seems current video players (at least two OSS ones -- MPlayer
> and VideoLAN that they are using in that project) do not support CTL.
>
> In speaking about this with another friend of mine he was wondering
> whether Arabic people have worked on getting Arabic subtitles, as it would
> also involve CTL.
>
> Can anyone shed light on CTL in subtitles? Perhaps Arabic or hopefully
> Indic?
>
> --
> Shriramana Sharma
>
>
Received on Mon Nov 14 2011 - 14:07:44 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 14 2011 - 14:07:46 CST