This is the home page for the Indic working group.
The main goal of the current activitiy is to make sure that Unicode 5.0 is as complete as possible for the major Indic scripts and languages:
We should use 4.1 as our earliest opportunity to bridge the gap until the 5.0 release. The textual content should be limited to critical parts. If we decide that new characters are needed, 4.1 should provide temporary solutions as well as point to the future solutions.
| P1: Script specific danda and double danda | ||
| AI 1.1 | everyone | Make sure that your point of view is reflected in the analysis |
| AI 1.2 | ? | Provide a rebutal for argument F1 |
| AI 1.3 | ? | Provide a rebutal for argument A1 |
| AI 1.4 | ? | Provide examples of use of the danda and double danda in the various languages in actual documents; this will be needed to go down the path of separate encoding; and it will shed light on the variation across languages |
| P2: Script specific Udatta and Anudatta | ||
| AI 2.1 | ? | determine if this is part of problem P30 |
| P3: Grave and acute | ||
| P4: Invisible letter | ||
| P5: Devanagari conjuncts | ||
| AI2.1 | ? | Make sure that the sequences for Devanagari conjuncts in NamedCompositeEntities.txt are accurate and complete. |
| P6: Devanagari SHA and LA for Marathi | ||
| P7: Sindhi implosives | ||
| A7.1 | ? | collect evidence of the Sindhi implosives in actual documents. v |
| A7.2 | ? | search for actual documents which contain both Sindhi and Sanskrit text |
| P8: Marathi eyelash RA | ||
| P9: Devanagari currency sign | ||
| P10: Devanagari signs for Sanskrit | ||
| P11: Assamese letters sort order | ||
| A11.1 | TDIL | confirm that the motivation for the request is reencoding is collation |
| P13: Bengali Ya-Phallaa | ||
| P14: Bengali KHYA | ||
| P15: Gurmukhi post-base/subjoined forms | ||
| A15.1 | TDIL | clarify the problem |
| P16: Move of U+0A71 GURMUHKI ADDAK | ||
| A16.1 | TDIL | clarify the problem |
| P17: Move of GURMUKHI EK ONKAR and ADI SHAKTI | ||
| A17.1 | TDIL | clarify the problem |
| P18: Encoding of Gurmukhi nasalized vowels | ||
| A18.1 | TDIL | clarify the problem |
| P19: Gujarati abbreviation sign | ||
| P20: Gujarati fractions | ||
| P21: Oriya vocalic RR | ||
| P22: Telugu nukta | ||
| P23: Telugu Avagraha | ||
| P24: Kannada vowel sign a | ||
| P25: Kannada reph | ||
| P26: Kannada Deergha Swaritha | ||
| P27: Support for Tulu and Kodava | ||
| P28: Malayalam Chillus | ||
| P29: Malayalam DIGIT ZERO and fractions | ||
| A29.1 | ? | research the digits and fractions in Malayalam, gathering evidence |
| P30: Vedic | ||
| P31: Musical notations | ||
| P12: Bengali Khanda Ta | ||
| Accepted for Unicode 4.1 |
Chapter 9: South Asian Scripts
Code charts: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala.
Common Local Data Repository project
The TDIL documents which describe the Indic scripts, including the proposed additions, can be found at http://tdil.mit.gov.in/news.htm. The January 2002 issue covers Devanagari and the following issues cover the other scripts.
The page http://tdil.mit.gov.in/pchangeuni.htm points to modified code charts as well.
The site of the Kerala IT Mission contains some documents related to character encoding at http://www.keralaitmission.org/malayalam/malayalam_keybo.htm.
We have a (very partial) bibliography of books and articles. Since many of these books are very hard to find, a source is identified if possible; we also have scan of relevant pages.
The mailing list indic at unicode.org is used by this group to discuss the problems.
To subscribe, send a message to ecartis at unicode.org, with “subscribe indic” as the subject. Be sure to send messages from the address you subsribed.
The archive of the list is available as a raw mbox file, at http://www.unicode.org/~ecartis/indic/. To get access, you need to use user-id “unicode-ml” and password “unicode”.
Over the years, the UTC and WG2 have refined their method of work. At this point, proposals for new characters are rather sophisticated documents. Here is a representative example:
L2/04-025: Proposal to encode 5 new Arabic script characters, by Jonathan Kew [3.5 Mbytes]
(Thanks for Jonathan for providing this document here.)
Note that the proposed characters are shown in “real life” examples. This gives a chance to the UTC and WG2 members to validate the arguments in favor of encoding, and sometimes leads to observations that escaped the proposer.
We have a number of tools available to resolve a problem:
a proposal to encode a character, with the proper evidence
a proposal to annotate an existing character, in the names list
a proposal for addition or modification of the text of the standard (presumably in chapter 9)
a proposal to add (or while it is still a draft, remove) sequences from Unicode named sequences
a proposal for modification or addition to the Indic FAQ.
the creation of one or more technical notes, to capture the cases which are too detailed for the scope of the standard.
a proposal to change the default tailoring of the UCA, if the problem is one of sorting.
the creation of UCA tailoring tables for specific languages.
the inclusion of localization data in the CLDR
We have to take into account the timing of the Unicode versions, and the various meetings leading to them.
The next UTC meetings are:
The Editorial Committee, whose function is to implement the decisions of the UTC in the form of text for the standard and the actual UCD content, meets about once a month.
The next release, 4.1, is currently planed for the first half of 2005. Its character repertoire is now essentially frozen, because of the synchronization with ISO 10646:2003 Amendment 1. The November meeting is the preferred target for technical proposals for 4.1 (other than new characters), as well as for draft text to be incorporated in the standard.
The release after that, 5.0, is planed for 6 to 12 months after 4.1, and will be synchronized with ISO 10646:2003 Amendment 2. New characters can be added to 5.0; in practice, it would be much simpler if solid character proposals are submitted by the November UTC meeting. Text for that version should be submitted to the Editorial Committee in the first half of 2005.
In both cases, text submitted to the Editorial Committee is likely to go through a number of revisions before becoming the published text, and this group will of course be involved in those revisions. The dates above are for solid first drafts.
| Revision | Date | Comments |
| September 11, 2004 | Added link to Kerala IT Mission |