L2/04-060 From: Gautam Sengupta Date: 2004-02-01 10:48:22 -0800 Subject: Encoding Bangla Khanda-Ta With Ta+Virama I would like to submit the following proposal for consideration by the UTC. Would any of you know how to send it across? I would also welcome comments from you. Thanks for your help. -Gautam ============================ PROPOSAL FOR ENCODING THE BANGLA KHANDA-TA WITH TA+VIRAMA GRAPHOLOGICAL BACKGROUND In Bangla (=Bengali) a dead consonant TA shows up as KHANDA-TA in all contexts except where it is immediately followed by one of the following consonants: TA, THA, NA, BA, MA, YA, RA. It is thus NOT a distinct abstract character, but a completely predictable display variant of the dead consonant TA. KHANDA-TA cannot bear a vowel matra or combine with a following consonant to form a conjunct aksara. It can form a conjunct aksara only with a preceding dead consonant RA, with the latter showing up as a REPH placed on the KHANDA-TA. RENDERING (1) A TA+VIRAMA sequence should always be displayed as a KHANDA-TA except when immediately followed by a ZWNJ or one of the consonants listed above. (2) Any rendering engine should, by default, introduce an orthographic syllable break after a TA+VIRAMA sequence except when the pair is immediately followed by one of the consonants listed above, or explicitly forbidden to do so by an immediately following ZWJ. This would take care of all normal, orthographically attested occurences of KHANDA-TA. ORTHOGRAPHICALLY UNATTESTED SEQUENCES To encode a KHANDA-TA+C conjunct or a KHANDA-TA bearing a vowel matra (both unattested in normal orthography) a ZWJ should be inserted after the TA+VIRAMA sequence, e.g. TA+VIRAMA+ZWJ+RA to encode a KHANDA-TA with a subjoined RA (RA-PHOLA) and TA+VIRAMA+ZWJ+MATRA to encode a KHANDA-TA bearing a vowel matra. To force a KHANDA-TA to show up in a context where a TA+VIRAMA would normally ligate with a following consonant, a ZWJ followed by a ZWNJ should be inserted after the TA+VIRAMA, e.g. TA+VIRAMA+ZWJ+ZWNJ+RA for KHANDA-TA followed by base form of RA. NOTE THAT SUCH COMPLICATED SEQUENCES ARE NEEDED FOR ENCODING UNATTESTED FORMS ONLY. MERITS OF THE PROPOSAL (a) Consistent with the avowed Unicode policy of encoding only abstract characters and not display variants. (b) Only TA+VIRAMA to be used consistently to encode all orthographically attested occurences of KHANDA-TA. (c) ZWJ and ZWNJ required only to encode orthographically unattested occurences of "junk sequences". (d) ZWJ and ZWNJ used in a manner consistent with their conventional semantics.