Antoine Leca
Date: Mon May 22 2000

Mark Davis wrote:
> There is a new proposed technical report on the Unicode site.
> document:

Interesting stuff. I believe this is good work, but as always there is
certainly room for improvement (I think Unicode is an endless work).

As Mark doesn't give any address for discussion, I assume this is
the correct forum. Please tell if I am wrong.

I am not completely comfortable with the assertions that only "letters",
but OTOH all "letters", have to be classified. Certainly this is an
easy-to-grasp barrier, but there are some border-case that looks funny
to me:
"script", while U+02C0;MODIFIER LETTER GLOTTAL STOP and U+02C1;

- Indic vowel marks are discarded: certaibly, they cannot occur at the
the beginning of a piece of text (being a word or a paragraph); but the
same can be said for come others codes, the first that striked me
are the Thai and Lao vowel marks that are *not* ordered in the front of
a syllable (i.e., I speak about sara a, sara aa, sara am, etc.)

- Indic (not Arabic-Indic) digits are not included, although they are
used only in the context of the relevant script.

- Devanagari OM is the only coded OM sign, while there exist variations
in other scripts (Gurmukhi is clear in this respect). I was assuming
the U+0950 should be used for the latter as well, the difference being
done by the surrounding informations ("higher-level protocol"). It
appears from Mark's tables that I was wrong, because U+0950 seems to
be reserved to rthe Devanagari script; so I wonder if we do not need
some new characters, such as *U+0A50 as Gurmukhi OM ?
(it now strikes me that U+0AD0 Gujurati OM is already included)

- same problem, although certainly much more minor, occurs with avagraha:
it is encoded in the Devanagari block, and variants (which looks like
a bit different) are encoded in the Gujarati and Oriya script. Fine,
but when Sanskrit is to be written in Bengali, or in any of the
South Indian scripts? do we need a bunch of new codes?

- this also remains me of the status of Tamil aytam U+0B83 "TAMIL SIGN
VISARGA", which is tagged "Mc", while it appears it may be a real letter
instead (but it cannot begin a word)


