Re: Eyelash Ra/Variant Mark (Last from me)

From: Anupam Saurabh (anupam@cdac.ernet.in)
Date: Thu May 28 1998 - 07:18:29 EDT


Hello

On Wed, 27 May 1998, Jeroen Hellingman wrote:
> > 1. It is neither a distinct consonant or a vowel by itself in the Marathi
> > "alphabet". But marathi has some words where spellings are taught to be
> > used with the eyelash-ra. Since Marathi uses normal ra also, and there
> > are words which sound the same but are spelt differently in the two forms
> > (eyelash-ra and reph), having different meanings, it was found acceptable
> > to treat it as RRA. This served the purpose of spelling, display,
> > sorting, phonetic correctness and usage. Hence eyelash-ra was
> > implemented in ISCII as RRA-halant as per the chart.
>
> This is the same using a separate character.

If you agree to using a seperate character, then using RRA is the best
alternative. We have re-checked 4-5 Marathi dictionaries as well - the
placement of eyelash-ra is just at the end of Ra in the dictionary. This
is well suited for using RRA-halant for eyelash-ra. But since there is no
seperate character listed as full form or half form of this eyelash-ra,
it makes a strong case for not putting it as a separate character other
than RRA. For the Devanagari chart, there it is not essential to treat it
as Dravidian RRA, and a transliteration from Tamil etc. may still map to
RRA in DV if desired, since it would be nearest match other than ra. Like
when you transcribe kha from DV to tamil, you can only map it to ka,
since there is no kha in tamil, and ka is the nearest match to make it
readable.

> From a implementation
> point of view, I wouldn't consider reph as a half-letter. Its behaviour
> is very much different from half-letters.

The behavior of reph is Identical to any other half-letter from language
angle. You must be referring to the display behavior which is at glyph
level and is different from other half consonants. The RAsup mentioned on
page 6-38 of Unicode 2.0 should clearly qualify that it is made by a
ra-halant as also shown in R2, R3.. etc.

> For a spell-checking application,
> using ZWJ may not be desirable, as such algorithms may be designed
> to ignore such characters, and this will have to be added as an
> exception. A separate eye-lash ra would be better in that case, even
> though its full form will look the same as the ordinary ra. I wouldn't

You seem to point to what we said. The full form of RRA in Devanagari is
denoted by ra with a dot below and for inputting on Inscript keyboard, it
is placed on the shift-ra position. Introducing another ra will not be
correct from Marathi language angle and will also make it difficult to be
used from keyboard.

> map it on the Dravidian RRA, as that is a different letter. It can
> be discussed whether a separate character has enough benefits above
> ra + zwj.

ra + zwj or a seperate character will only complicate the usage of Marathi.
Just let me know, how would one represent or show an isolated reph (e.g.
in a text book) if you have used up ra-halant-zwj for eyelash-ra, as given
in R5 on page 6-39?

> > 2. It will cause difficulty in default sort order, if it is placed at the
> > end of current assignments.
>
> Recently, the a default Unicode collating algorithm was proposed
> in a technical report. If that is going to be implemented (hopefully
> as part of a standard Unicode API) code-points will not be very
> relevant to sorting applications, so this is no objection to me.

YEs, Unicode collating algorithm will bring about a perfection in the
sort order for each language. But all applications may not implement it
very soon, so a default is expected to be usable with as few exceptions
as possible.

> > 3. It will merely treat it as a glyph variant for the Marathi script in the
> > chart whereas ISCII/Unicode are meant to address the alphabet/character
> > encoding mechanism. It will contradict the basic phonetic approach used
> > for the design of character coding. This way a lot of alternate display
> > forms or glyphs or conjuncts may need to be added in future and the code
> > chart may only look like patchwork.
>
> Agreed, but in this case, there is a non-contextual reason (spelling
> of words) to use one or the other. However, I still think that using the
> Unicode standard as it is, is sufficient for eye-lash ra.

A) Again I would like to know, how would one encode/represent or show an
isolated reph (e.g. for a text book) if you have used up ra-halant-zwj
for eyelash-ra, as given in R5 on page 6-39?

B) If you use the current Unicode description, and for searching or in
spell-checking if the zwj is ignored, how would one distinguish between
two words having the same spelling except of zwj but different meanings?

> > 4. If you are thinking of adding a character for it, then what is wrong in
> > using the RRA U0931 which is already fitting neatly from all considerations.
> > (Refer to my previous mails today demonstrating its usage and
> > implementation in ISCII). It is not essential to treat all scripts of Unicode
> > identically for rendition. Although DV(09XX) in Unicode provides symbols
> > for transcribing tamil and other scripts, it is not a primary
> > consideration when Devanagari has a seperate code space in Unicode.
> > Priority must be given to Marathi and Hindi instead of the objective of
> > transcription. Perfect transcription from other languages can still be
> > achieved by other external software if need be.
>
> In Unicode, you'll have to use some kind of table solution to
> transliterate one script into another, and even then the transliteration
> will not be perfect, although acceptable for the purposes you
> describe, like rail-road reservation charts. I've been making such
> tables, but correct transliteration requires knowledge of the
> languages involved.

Yes, I agree with you over here. We have developed such transliteration
mechanisms with complete knowledge of the languages and usage, which are
rule based and are more accurate than simpler tables. In near future, we
would demonstrate live applications of such transliteration on our Web
site for the interest of global community.

Also, we do not agree to either the current depiction of eyelash-ra
in Unicode 2.0 or an addition of one more character. Either of these
cases lead to problems as discussed here.

**** Additional thoughts for all in love with Indian languages **********

Our opinion is that people not knowing these languages are likely to
mis-interpret the rendition issues and cause some inconsistencies to
the ultimate users of these languages. I agree that Unicode was not meant
to address the glyph composition. But since it is happening now, it is
important to exercise care in this area. Fortunately, since it is
possible to correct these anamolies in independent versions of
implementation of the code-to-glyph mapping, we are sure that we will be
able to correct any such problems.

I would like to add that we have experienced and addressed these issues
over a period of 10 years on GIST Technology and we know many of
these things raise doubts in case of any new implementations. I can
only advise them to download LEAP-Lite and understand the implementations
IF YOU WISH TO BE CORRECT IN THE VERY FIRST TIME.

I wish I had a 48 hour day in which I could give 5-6 hours on various
mailing lists of Unicode, mozilla, I18N, etc. on these issues which are
now gaining interest.
Since we have limited time in which we are doing various projects on
development of Indian language software, tools, OCR, etc., I may not be
able to continue responding on this thread consistently. But lack of
response does not imply an agreement from our angle.

Regards
Anupam
---------------------------------------------------------------------------
Anupam Saurabh Email: anupam@cdac.ernet.in
Group Co-ordinator GIST R&D, C-DAC Phone: +91-212-370034, 352461
Centre for Development of Advanced Computing URL : http://www.cdac.org.in
Pune University Campus, Pune-411007, India Fax : +91-212-357551
* Download Free LEAP-Lite from our Web Site or sendmail to free@cdac.ernet.in



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT