Re: Sanskrit nasalized L from Shriramana Sharma on 2011-08-14 (Unicode Mail List Archive)

From: Shriramana Sharma <samjnaa_at_gmail.com>
Date: Mon, 15 Aug 2011 07:21:20 +0530

On 08/15/2011 01:48 AM, Richard Wordingham wrote:
>>>> <la, virama, candrabindu, la>
>>>>
>>>> is strictly speaking *the* sequence recommended *across* Indic
>>>> scripts for representation of Sanskrit clusters involving a nasal
>>>> and non-nasal "semivowel".
>>>
>> However, people working with Indic rendering in a major operating
>> system support the concept (see
>> http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0153.html).
>
> Thanks, that's useful as a reference - it helps me find it later.

You should also see Peter Constable's mail to me (and the list):

http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0143.html

"""As you and I independently commented, it makes sense to see the
virama as being in a distribution class together with matras and,
therefore, for the virama to precede candrabindu."""

> The issues is on the relative ordering of candrabindu and virama. For
> a C1-conjoining form (i.e. C2 relatively unmodified),<la virama
> candrabindu la> is easier to handle. For a C2-conjoining form,<la
> candrabindu virama la> is easier to work with.

Hmm -- perhaps you mean this is so because it would be possible to
easily map Virama + LA to the C2-conjoining form? This is true enough,
but it is advisable to have a single uniform representation across Indic
scripts and that is LA + Virama + Candrabindu + LA (because of the
reasons outlined by Peter and me in the previous mails I have linked to
from the archives).

Further, the situation, especially with C2-conjoining scripts, is not
all that simple. See the attachment: the underlying phonetic consonant
cluster is nasal-V + V + LA. It would be represented in encoded form as:
VA + Virama + Candrabindu + VA + Virama + LA. As you say it would indeed
be *easier* (in some way) if the Candrabindu were to precede the virama
but I am not sure that is *advisable*.

> Vowels and the like already occur within Tai Tham and Khmer consonant
> clusters with C2-conjoining forms.

I know that and that is why I distinguish "Indian" Indic scripts and
"non-Indian" (i.e. South East Asian [SEA]) Indic (i.e. Brahmic) scripts,
especially in Unicode. It seems that at least in Khmer (I didn't check
the other charts/chapters) one vocalic R/L vowel is represented by the
independent vowel presented as a sub-base (which you call C2, but as it
is not a consonant it is not actually a "C"-2) form, but haven't you
noticed that it is true of all Indic scripts so far encoded that the
vowel sign for vocalic L/LL is the same as the independent vowel placed
(albeit in a somewhat smaller size) below the base? It is simply a
choice of encoding model -- in Indic (I'll stop calling it "Indian"
Indic as I feel "Brahmic" is the coverall term and Indic is a specific
term) all these vowel signs are encoded as separate characters whereas
in SEA they are handled by a virama-like character like sub-base consonants.

> Normally the virama equivalent (CCC
> 9) occurs immediately before C2, but that can already be displaced in
> normalised text, e.g. the Northern Thai loan word ᩈᩮᩥᩁ᩠᩺ᨷ (from English
> 'serve') normalised to ᩈᩮᩥᩁ᩠᩺ᨷ<U+1A48 TAI THAM LETTER HIGH SA, U+1A6E
> TAI THAM VOWEL SIGN E, U+1A65 TAI THAM VOWEL SIGN I, U+1A41 TAI THAM
> LETTER RA, U+1A60 TAI THAM SIGN SAKOT, U+1A7A TAI THAM SIGN RA HAAM,
> U+1A37 TAI THAM LETTER BA>. The rendering on p155 of Bunkhit
> Watcharasat's 'Northern Thai Teach-Yourself Book' (Siamese:
> ภาษาเมืองล้านนา ฉบับเรียนด้วยตนเอง) makes it clear that the ra haam
> (vowel killer, here acting as a consonant killer) acts on the letter ra.

Hmmm -- I'm not sure I entirely grok the SEA situation with Thai/Tai
Tham/Khmer etc, but I'm sure the handling of vowelless consonants and
conjoining forms in those scripts does deviate from the *Indic* model.
For example, see that stuff about the Balinese Surang and how it is
handled...

> I've seen a claim that vowels within Tibetan consonant stacks can be
> handled sensibly within the confines of Unicode - I didn't investigate
> it.

I don't understand what you mean by "vowels within Tibetan consonant
stacks". I also don't know whether Tibetan language written in Tibetan
script requires the conjoining forms of vowels but I do know (to an
extent) that Sanskrit written in Tibetan doesn't require "conjoining"
forms of vowels per se generated by a virama-like character.

> I think an official ruling should cover all Indic scripts,
> ideally even those encoded in writing order, such as Thai. (I'm
> presuming the Thai script's subscript consonants will be supported one
> day, and Lao already has one unambiguously subscript consonant.)

I would prefer for a ruling to first focus on the (Indian) Indic scripts
and its extension to SEA scripts be done on a script-by-script basis
after examination of the existing model for those scripts. It would not
be appropriate IMHO to hastily apply the (Indian) Indic model to SEA
scripts.

-- 
Shriramana Sharma

Received on Sun Aug 14 2011 - 20:54:15 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 14 2011 - 20:54:16 CDT