Re: Devanagri: - Vedic characters?

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Dec 09 1999 - 15:07:24 EST


> On Thu, 25 Nov 1999, Christopher John Fynn wrote:
>
> > In the Devanagri block does Unicode 3.0 now include "the Extended
> > Characters for Vedic" as listed in Annex G of Indian Standard
> > IS 13194 : 1991 (ISCII)?
> >

Jim Agenbroad answered:

> Thursday, December 9, 1999
> Chris,
> I suspect they are not in Unicode 3.0, others may have given you a
> more definitive answer on that. I don't know Sanskrit.
> Do you know if these or some analogus characters are used when Vedic
> Sanskrit is written in other Indic scripts?

The answer to Chris' question is that no additions intended specifically
to cover the "Extended Character Set for Vedic" in Annex G of
IS 13194:1991 (ISCII) are included in Unicode 3.0.

This is not simply due to oversight. That set of extensions has been
on the radar screen for a number of years. However, no one has come
forward with a sufficiently detailed analysis of the Vedic usage to
prepare a proposal that could pass technical scrutiny for these
additions.

Part of the problem is that Annex G of ISCII contains a grab-bag of
elements, to cover various Vedic manuscript traditions. It is not
clear what is intended to be a "character" and what is intended
to be a "glyph" -- except in the trivial sense that since numbers
have all been assigned in ISCII, all have become "encoded characters".

One approach would be to simply encode the bunch as more "compatibility
characters", to enable simple roundtrip mapping to an ISCII implementation
of these extensions. However, it seems more advisable to actually
analyze the set and come up with a more coherent proposal, fitting
with the Unicode text model for Devanagari.

However, the ISCII Annex G text is full of problematical statements.
For example, regarding Puspiká: "This symbol is just a substitute for
the spaces between words, and hence is not needed." Yet the symbol
*is* encoded in the list (0xA2 Filler). There is also the practice,
noted for Samaveda, of marking svaras by placing the Devanagari
characters for 1, 2, 3, ka, ra, or u above other characters. No
coherent text model is supplied for that -- instead ISCII suggests
that they "can be placed in the corresponding positions of the
previous row [of text]", and does not encode them.

Some of the combining marks used for marking Vedic svaras are
already encoded (as of Unicode 1.0, no less): U+0951 ... UDATTA
and U+0952 ... ANUDATTA, although udatta is normally not marked,
and U+0951 varies in usage between udatta (Maitrayaniya) and
svarita (Rigveda). The long svarita could be indicated by the
existing character U+030E COMBINING DOUBLE VERTICAL LINE ABOVE.
The kampa could be indicated either by the UDATTA or by
the other similar existing character, U+030D COMBINING VERTICAL LINE
ABOVE. Sentence-ending udatta in Sukla Yajurveda texts would either
be indicated by a sequence of full stops (..) or by the existing
character U+0324 COMBINING DIAERESIS BELOW, if it were written
beneath a character.

The Devanagari abbreviation sign (ISCII Annex G 0xBF Abbreviation
sign) is already encoded: U+0970 DEVANAGARI ABBREVIATION SIGN.

There are a number of problems with variant forms of the same
abstract characters, as well. (5 different visargas, 7 different
anusvaras) How these would be encoded in any extension for Vedic
characters will likely depend on the outcome of the ongoing debate
in the UTC regarding how to deal with variant encoding for other
domains (Math symbols and CJK variants, most importantly). It is
likely that UTC will opt for a generic approach to indication
of variants -- in which case it may turn out that many of these
Vedic variants can be accomodated in terms of the generic
mechanism.

In any case, you can see that the issue is not straightforward --
and until someone champions the process through UTC and WG2,
complete with analysis and implications for the text model for
Devanagari, little progress can be expected.

I don't know anything about Vedic manuscript traditions in other
Indic scripts. I suspect, as for other manuscript traditions, including
those of the Latin script, any number of additional complications
will show up. And resolution of what is appropriate for encoding
and what remains outside the encoding context will depend on
analysis of the individual cases.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT