Re: ISO 10646, Unicode & The FAQ (Bengali Khanda Ta)

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Nov 21 2002 - 22:56:37 EST

Next message: rajesh@inflibnet.ac.in: "Re: Anyone who can write Hindi on the Unicode List?"

Previous message: Rick McGowan: "Re: ISO 10646, Unicode & The FAQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Rick investigated, and came up with:

> In a specific case, Andy asked about Khanda Ta, and pointed to a WG2
> resolution that contradicts the Unicode FAQ on the same topic. I looked up
> a paper listing an action item as follows, taken from document
> http://anubis.dkuug.dk/JTC1/SC2/WG2/docs/M40ActionItems.pdf which are the
> action items from meeting #40 of WG2; the decision was from meeting #39 in
> October 2000:
>
> Resolution M39.11 (Request from Bangladesh): In response to the
> request from Bangladesh Standards and Testing Institution in
> document N2261 for adding KHANDATA character to 10646, WG2 instructs
> its convener to communicate to the BSTI: a. that the requested
> character can be encoded in 10646 using the following combining
> sequence: Bengali TA (U+09A4 ) + Bengali Virama (U+09CD) + ZWNJ
> (U+200C) + Following Character(s), to be able to separate the
> KHANDATA from forming a conjunct with the Following Character(s).
> Therefore, their proposal is not accepted. b. our understanding
> that BDS 1520: 2000 completely replaces the BDS 1520: 1997.
>
> That does indeed give a different answer than the Unicode FAQ.
>
> I wonder if anyone else knows whether the text of 10646 contains any
> mention of Khanda Ta, and if so, what it says.

It does not mention Khanda Ta.

And I guess it's time to open that old CBS (character BS) mailbag
to track this sucker down.

Resolution M39.11 dates from the WG2 discussion of September 20, 2000
(at the WG2 meeting in Vouliagmeni, Greece). It was agenda item 7.12
at that meeting, "Proposal to synchronize Bengali standard with 10646",
during which the question came up about what is this "KHANDATA" thing
in Bengali BDS 1520:2000 standard
anyway, and should it be encoded as a separate character, as it was
(at code point 0xBA) in BDS 1520:2000.
For details of the discussion, see the WG2 meeting minutes, online
in WG2 N2253.

The upshot of the initial discussion was that Michael Everson was
tasked with an action item, to wit:

"Michael Everson to contact BSTI (email id, name etc. are in the cover
letter) - a query was sent out to Unicode expert's list also."

The response received to the query to the Unicode list on September 20
from a Mr. Abdul Malik seemed to answer the question of what the
KHANDATA was. Anyone who wants to can dig it out of the Unicode email
archives: X-UML-Sequence: 16066 (2000-09-20 16:22:21 GMT). But the
relevant portions of the email were:

<quote>

----- Original Message -----
From: "Michael Everson" <everson@egt.ie>
To: "Unicode List" <unicode@unicode.org>
Sent: Wednesday, September 20, 2000 10:30 AM
Subject: Request about Bengali/Bangla

> BDS 1520:2000 contains a BANGLA LETTER KHANDATA and it has been proposed
> for addition to the UCS. I am at the WG2 meetings in Athens where the
> character is being discussed, but we don't know how to evaluate it.

A representative of the Bangladesh Standards and Testing Institution (the
instigator of the proposal) should be better placed to answering these
questions than me, anyway...

> What is this character and how is it used?

KhandoTa is a form of the letter Ta. It is the form Ta takes when it has no
inherent vowel. It occurs when final and medial, but never the initial
letter of a word. It is equivalent to Ta virama. Ta with a visible virama is
only needed for illustrative purposes, kandaTa being used in its place in
all Bengali words, except when it forms a conjunct form.

For example in a standard without KhandaTa, there are two different forms
the sequence Ta Virama Ma need to take i.e. khandoTa_Ma or the
Ta/Ma_conjunct_form. As BSD1520:2000 does not include any ligation control
characters other than Virama, it is necessary to include KhandaTa as a
separate letter to make the two previously mentioned forms.

> Another question, is does BDS 1520:2000 completely replace BDS 1520:1997,
> or is the old standard still valid (and being implemented)?

BDS 1520:1997 is based on a font encoding. It is the standard currently used
in the products of Proshika Computer Systems and AdarshaBangla Technologies
Inc. It is also the encoding used in many web sites.
BDS 1520:2000 is a complete replacement, being based on the ISO/IEC10646
character encoding model. AFAIK it is yet to receive a real world
implementation.
BDS 1520:2000 seems immature as it does not include any encoding principles
or rendering rules, for example, how is Bengali zophola to be formed? Is it
formed from Ya or YYa?

> What are the implications for interoperability between this standard and
ISCII standards?

As BDS 1520 does not currently have an encoding model to refer to, one can
not say. e.g. to form Ka_halant Ka:
in Unicode :- Ka virama ZWNJ Ka
In ISCII :- KA Virama Virama Ka
In BDS :- ??

Regards

Abdul

</quote>

It was on the basis of *this* feedback from a Bengali expert on
the Unicode list, reported back by Michael Everson to the WG2 meeting,
that WG2 drafted a resolution responding to the request by BSTI
expressed in WG2 N2261.

The intent of resolution M39.11 is expressed in the last sentence
of part a: "Therefore their proposal is not accepted." In other words,
WG2 went on record as claiming there is already a way to represent
Khanda Ta unambiguously using the current characters, and that hence
there was no reason to encode a separate character.

Abdul's discussion above explains the reason why BDS 1520:2000 felt
it necessary to have a separate character for Khanda Ta, since it
contains no ZWNJ or rendering rules which could explain how it would
otherwise be represented using that standard.

What WG2 resolution M39.11 can *not* be interpreted as, however, is
a definitive ISO statement about Bengali rendering rules in 10646. No
such language was, in fact added to ISO/IEC 10646, and in general
such material is not a part of that standard. Rendering rules for
Indic scripts are the kind of add-on one finds in the Unicode
Standard, instead. The language in M39.11 was quickly drafted to
sketch out the reason why encoding of Khanda Ta was not needed,
but cannot be understood as establishing an ISO standard in the
matter of rendering of Bengali ta's.

Now the analysis of Khanda Ta presented in the Unicode FAQ resulted
from further discussion of the issue which took place on the
Unicode email list after the Greece WG2 meeting. I can't recall
all the details of that right now -- although I'm sure people could
dig it out of the archives, but my reading of the FAQ suggests
that the proposal that Abdul Malik had suggested for how to
represent Khanda Ta was subjected to more analysis in the context
of similar rendering processes for other Indic scripts.

In particular, since the sequence C - virama - ZWNJ - C is
generally used to display the *explicit* virama (blocking a
conjunct), and since such forms with explicit virama also
occur in Bengali, it seemed better to keep that sequence for
explicit viramas in Bengali as well. The other sequence,
C - virama - ZWJ - C in Devanagari, at least, is used for
representing half-consonant forms. Now while the Bengali
Khanda Ta is not actually a "half-consonant", but a full
letter form, it still contrasts with TA in conjuncts and
TA with explicit virama (halant). So the moral equivalent
sequence for representing the Khanda Ta would then be:
TA - virama - ZWJ - C.

I have not digested all the argumentation in the last month about
this topic, so cannot say what I feel the *right* answer, finally,
is for this. But now, please, stop speculating about how things
got to be the way they are, stop arguing about whose specification
trumps whose (a statement in a WG2 resolution which is not reflected
in the ISO 10646 standard or a statement in a Unicode website
FAQ which is not reflected in the Unicode Standard), and focus
on what is the technically best advice to give people about
representing the Bengali Khanda Ta, given the context explained
in the Unicode FAQ.

--Ken

Next message: rajesh@inflibnet.ac.in: "Re: Anyone who can write Hindi on the Unicode List?"
Previous message: Rick McGowan: "Re: ISO 10646, Unicode & The FAQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Nov 21 2002 - 23:45:06 EST