Re: ISO 10646, Unicode & The FAQ (Bengali Khanda Ta)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Nov 21 2002 - 22:56:37 EST

  • Next message: rajesh@inflibnet.ac.in: "Re: Anyone who can write Hindi on the Unicode List?"

    Rick investigated, and came up with:

    > In a specific case, Andy asked about Khanda Ta, and pointed to a WG2
    > resolution that contradicts the Unicode FAQ on the same topic. I looked up
    > a paper listing an action item as follows, taken from document
    > http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/M40ActionItems.pdf which are the
    > action items from meeting #40 of WG2; the decision was from meeting #39 in
    > October 2000:
    >
    > Resolution M39.11 (Request from Bangladesh): In response to the
    > request from Bangladesh Standards and Testing Institution in
    > document N2261 for adding KHANDATA character to 10646, WG2 instructs
    > its convener to communicate to the BSTI: a. that the requested
    > character can be encoded in 10646 using the following combining
    > sequence: Bengali TA (U+09A4 ) + Bengali Virama (U+09CD) + ZWNJ
    > (U+200C) + Following Character(s), to be able to separate the
    > KHANDATA from forming a conjunct with the Following Character(s).
    > Therefore, their proposal is not accepted. b. our understanding
    > that BDS 1520: 2000 completely replaces the BDS 1520: 1997.
    >
    > That does indeed give a different answer than the Unicode FAQ.
    >
    > I wonder if anyone else knows whether the text of 10646 contains any
    > mention of Khanda Ta, and if so, what it says.

    It does not mention Khanda Ta.

    And I guess it's time to open that old CBS (character BS) mailbag
    to track this sucker down.

    Resolution M39.11 dates from the WG2 discussion of September 20, 2000
    (at the WG2 meeting in Vouliagmeni, Greece). It was agenda item 7.12
    at that meeting, "Proposal to synchronize Bengali standard with 10646",
    during which the question came up about what is this "KHANDATA" thing
    in Bengali BDS 1520:2000 standard
    anyway, and should it be encoded as a separate character, as it was
    (at code point 0xBA) in BDS 1520:2000.
    For details of the discussion, see the WG2 meeting minutes, online
    in WG2 N2253.

    The upshot of the initial discussion was that Michael Everson was
    tasked with an action item, to wit:

    "Michael Everson to contact BSTI (email id, name etc. are in the cover
    letter) - a query was sent out to Unicode expert's list also."

    The response received to the query to the Unicode list on September 20
    from a Mr. Abdul Malik seemed to answer the question of what the
    KHANDATA was. Anyone who wants to can dig it out of the Unicode email
    archives: X-UML-Sequence: 16066 (2000-09-20 16:22:21 GMT). But the
    relevant portions of the email were:

    <quote>

    ----- Original Message -----
    From: "Michael Everson" <everson@egt.ie>
    To: "Unicode List" <unicode@unicode.org>
    Sent: Wednesday, September 20, 2000 10:30 AM
    Subject: Request about Bengali/Bangla

    > BDS 1520:2000 contains a BANGLA LETTER KHANDATA and it has been proposed
    > for addition to the UCS. I am at the WG2 meetings in Athens where the
    > character is being discussed, but we don't know how to evaluate it.

    A representative of the Bangladesh Standards and Testing Institution (the
    instigator of the proposal) should be better placed to answering these
    questions than me, anyway...

    > What is this character and how is it used?

    KhandoTa is a form of the letter Ta. It is the form Ta takes when it has no
    inherent vowel. It occurs when final and medial, but never the initial
    letter of a word. It is equivalent to Ta virama. Ta with a visible virama is
    only needed for illustrative purposes, kandaTa being used in its place in
    all Bengali words, except when it forms a conjunct form.

    For example in a standard without KhandaTa, there are two different forms
    the sequence Ta Virama Ma need to take i.e. khandoTa_Ma or the
    Ta/Ma_conjunct_form. As BSD1520:2000 does not include any ligation control
    characters other than Virama, it is necessary to include KhandaTa as a
    separate letter to make the two previously mentioned forms.

    > Another question, is does BDS 1520:2000 completely replace BDS 1520:1997,
    > or is the old standard still valid (and being implemented)?

    BDS 1520:1997 is based on a font encoding. It is the standard currently used
    in the products of Proshika Computer Systems and AdarshaBangla Technologies
    Inc. It is also the encoding used in many web sites.
    BDS 1520:2000 is a complete replacement, being based on the ISO/IEC10646
    character encoding model. AFAIK it is yet to receive a real world
    implementation.
    BDS 1520:2000 seems immature as it does not include any encoding principles
    or rendering rules, for example, how is Bengali zophola to be formed? Is it
    formed from Ya or YYa?

    > What are the implications for interoperability between this standard and
    ISCII standards?

    As BDS 1520 does not currently have an encoding model to refer to, one can
    not say. e.g. to form Ka_halant Ka:
    in Unicode :- Ka virama ZWNJ Ka
    In ISCII :- KA Virama Virama Ka
    In BDS :- ??

    Regards

    Abdul

    </quote>

    It was on the basis of *this* feedback from a Bengali expert on
    the Unicode list, reported back by Michael Everson to the WG2 meeting,
    that WG2 drafted a resolution responding to the request by BSTI
    expressed in WG2 N2261.

    The intent of resolution M39.11 is expressed in the last sentence
    of part a: "Therefore their proposal is not accepted." In other words,
    WG2 went on record as claiming there is already a way to represent
    Khanda Ta unambiguously using the current characters, and that hence
    there was no reason to encode a separate character.

    Abdul's discussion above explains the reason why BDS 1520:2000 felt
    it necessary to have a separate character for Khanda Ta, since it
    contains no ZWNJ or rendering rules which could explain how it would
    otherwise be represented using that standard.

    What WG2 resolution M39.11 can *not* be interpreted as, however, is
    a definitive ISO statement about Bengali rendering rules in 10646. No
    such language was, in fact added to ISO/IEC 10646, and in general
    such material is not a part of that standard. Rendering rules for
    Indic scripts are the kind of add-on one finds in the Unicode
    Standard, instead. The language in M39.11 was quickly drafted to
    sketch out the reason why encoding of Khanda Ta was not needed,
    but cannot be understood as establishing an ISO standard in the
    matter of rendering of Bengali ta's.

    Now the analysis of Khanda Ta presented in the Unicode FAQ resulted
    from further discussion of the issue which took place on the
    Unicode email list after the Greece WG2 meeting. I can't recall
    all the details of that right now -- although I'm sure people could
    dig it out of the archives, but my reading of the FAQ suggests
    that the proposal that Abdul Malik had suggested for how to
    represent Khanda Ta was subjected to more analysis in the context
    of similar rendering processes for other Indic scripts.

    In particular, since the sequence C - virama - ZWNJ - C is
    generally used to display the *explicit* virama (blocking a
    conjunct), and since such forms with explicit virama also
    occur in Bengali, it seemed better to keep that sequence for
    explicit viramas in Bengali as well. The other sequence,
    C - virama - ZWJ - C in Devanagari, at least, is used for
    representing half-consonant forms. Now while the Bengali
    Khanda Ta is not actually a "half-consonant", but a full
    letter form, it still contrasts with TA in conjuncts and
    TA with explicit virama (halant). So the moral equivalent
    sequence for representing the Khanda Ta would then be:
    TA - virama - ZWJ - C.

    I have not digested all the argumentation in the last month about
    this topic, so cannot say what I feel the *right* answer, finally,
    is for this. But now, please, stop speculating about how things
    got to be the way they are, stop arguing about whose specification
    trumps whose (a statement in a WG2 resolution which is not reflected
    in the ISO 10646 standard or a statement in a Unicode website
    FAQ which is not reflected in the Unicode Standard), and focus
    on what is the technically best advice to give people about
    representing the Bengali Khanda Ta, given the context explained
    in the Unicode FAQ.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Nov 21 2002 - 23:45:06 EST