Re: Devanagari [was Re: Any web-published rebuttals to criticism?]

From: Sandeep Sibal (sibal@att.com)
Date: Thu Jan 09 1997 - 11:15:04 EST


As reference for this discussion - especially for folks who are
unfamiliar with Devanagari, the following pages might help:
http://bach.ecse.rpi.edu/~sibal/unidnchars.html
http://bach.ecse.rpi.edu/~sibal/isciichars.html

Glenn Adams wrote:
>
> I take it you don't like ISCII either. The problem is you are
> thinking like a font designer and not a text processor. Think about
> the fact that there is no ideal coding that is globally optimal
> with respect to all types of processing. A encoding which favors
> display operations may be lousy for keyboarding or for string searching
> or for word or syllable segmenting. And vice versa.

I dont think I'm thinking like a font designer only. I'm
thinking like what I think Unicode stands for. And apart from
"still other problems" with ISCII, I think ISCII is a good
standard. The point is that the objectives of ISCII are NOT the
objectives of Unicode. ISCII was written several years ago, with
the idea of "encoding" all Brahmic languages in 7-bits. Indeed
the very first line of the standard announces that. It appears
quite obvious that (not unlike Unicode) all chars could not be
captured in the range desired, and so some of the basic but less
frequently used vowels and some consonants are specified by
appending a modifier in front. Specified in the standard are
also Keyboard overlays. Indeed it appears the objective of
ISCII (judging from the fact that less frequently used chars
are relegated to the "modifier zone"), is as much for ease
of keyboard entry as anything. In many ways it is a
"keyboard-inspired" or "keyboard-constrained" encoding
of Brahmic languages.

Unicode is very different. The 1-byte constraint does not exist.
Keyboard entry is not a goal. Characters are supposed to have a
distinct "cognitive" representation. Well then, you cant take
ISCII, dust it somewhat, and make that the Unicode encoding of
Devanagari. That just isnt right. If characters exist in ISCII
that are encoded by a sequence of two or more positions,
they should be "opened up" in Unicode, because they are
cognitively, phonetically and visually distinct.

Compared to ISCII, Unicode does flatten out the encoding,
removing the use of most of the modifiers. Nonetheless, some
(like I point out) still remain. I dont think it behooves Unicode
to be complacent just because they do a better job than ISCII.
ISCII did the best they could in 7-bits, which is one helluva
constraint for something like Brhamic, let alone Devanagari!
[Brahmic is a superset of Devanagari]

For a moment, imagine a situation where English had to be
encoded in 6-bits. All upper cases were encoded as
"CapsLock"+lower-case-letter, and all symbols by a
"SymbolModifier"+char. Now, imagine a world that was
creating a 12-bit Unicode. Would you simply leave the
encoding as a 6-bit encoding with minor fixes?

One of the problems with Unicode is that its precise objective
is unclear. Is it inspired by "display" or "text-processing"?
And what kind of "text-processing"? Are you saying that
half-consonants that SOUND and LOOK different are better
encoded as "full-consonant"+"half-consonant-modifier",
rather than stand-alone characters?

> If you want to critique the current encoding of Devanagari, start by
> saying what it gives up and contrast this to what it provides.

Why? Is that a requirement for pointing out problems in a standard?
What I am saying is that Unicode can do a much better job of encoding
Devanagari if a some more characters had been present.

> What it provides is a logical encoding of the script based on phonetic
> ordering with implicit conjunct formation. What it provides is consistency
> with existing coding practices (ISCII). What it provides is compatibility
> with existing software systems. Why encode half-forms and conjuncts except
> to satisfy font designers who want a glyph registration authority?

The problem is not that it is consistent with ISCII, its that
it is a bit "too consistent"! Anything that looks like a copy will
probably be consistent with the original by default. Compatibility
with existing software systems sounds more like an excuse to me.
Unicode breaks new ground. Compatibility should not become an
obstacle.

One of the "problems" with Devanagari is that it cannot be well
encoded in 1-byte; while 2-bytes appears like overkill. Its script is
somewhere between an alphabet script (like English), and a syllabary.
Most current encodings of alphabet scripts fit into an ISO-8859-*
representation in 94/95 codepoints. And languages like Chinese which
are *impossible* to encode in 7-bits have a 2-byte encoding.
Devanagari can be fitted in 7-bits (evidenced by ISCII) albeit
*very* tightly. Perhaps, if their existed a 2-byte encoding of
Devanagari - with maybe only 200 used codepoints, Unicode would
have something to model itself after.

And, I am not a font designer. I am interested in the language and
the script. And I dont think the objective should be to satisfy
authorities. I dont belong to any authority either. I think the
objective should be to satisfy the objectives of the standard. If
it cant or isnt being satisfied, then Unicode should admit it,
and alter its objectives/claims. As it stands, there is no consistent
encoding philosophy. You still havent responded on why ligatures are
present for Latin and Arabic, while Devanagari's half-consonants
that look and sound different (let alone visually distinct
ligatures) are not? Why this biased treatment??

> If this is what you want, talk to AFII.

See above. I repeat, I am as innocent an observer as you can get.
No ties to any bodies except my employer, who I do not represent
here. All my opinions are completely my own.

Sandeep.

-- 
Sandeep Sibal
Phone: (908) 949-6277
Email: sibal@att.com
WWW: http://weed.arch.com.inter.net/~sibal/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT