From: Dheeraj Kumar ([email protected])
Date: Fri Nov 03 2006 - 02:10:54 CST
Thanks William,
I think a few Kashmiri characters are missing and so are a few Hazargi ones too. Given the diversity of languages and scripts in India, I humbly request that the Unicode Consortium be fast in incorporating other minor Indian languages into the standard lest non-standard approaches become pre-dominant. Manipuri is another language which may have characters requiring standardization. Khasi and Garo also come to my mind as I write this.
Best regards,
Dheeraj
=========================
----- Original Message ----
From: William J Poser <[email protected]>
To: [email protected]
Sent: Thursday, November 2, 2006 7:54:10 AM
Subject: Kashmiri in the Person-Arabic script
Assuming that you know Kashmiri, get a copy of the Unicode 5.0 standard
or go to the web site and obtain the code charts for Arabic:
http://www.unicode.org/charts/PDF/U0600.pdf Arabic
http://www.unicode.org/charts/PDF/U0750.pdf Arabic Supplement
http://www.unicode.org/charts/PDF/UFB50.pdf Arabic Presentation Forms A
http://www.unicode.org/charts/PDF/UFE70.pdf Arabic Presentation Forms B
To the extent that the Unicode names are sufficient for
identifying characters, which they very likely won't be,
you can also work from the NamesList:
http://www.unicode.org/Public/UNIDATA/NamesList.txt
Then go through and see if you can find all of the characters needed
for Kashmiri. The existing standard does include several characters
exclusively for Kashmiri, e.g. U+06C4 ARABIC LETTER WAW WITH RING,
or for Kashmiri and one or two other languages, e.g.
U+0673 ARABIC LETTER ALEF WITH WAVY HAMZA BELOW,
listed as for Kashmiri and Baluchi, but it is possible that there are
still omissions. My somewhat superficial comparison of the Indian PASCII
standard with Unicode left me with the impression that Kashmiri
and/or Sindhi may require additions.
You also need to have some understanding of what
what counts as a character. If, for example, a certain
ligature does not appear in Unicode, it may be that Unicode
considers it to be a rendering variant of a sequence of
two characters and has decided not to encode it separately.
Bill
This archive was generated by hypermail 2.1.5 : Fri Nov 03 2006 - 02:13:30 CST