Re: Serious problems with Arabic

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jan 19 2001 - 14:06:13 EST


Roozbeh asked:

> Dear Kenneth,
>
> Due to some problems with Unicode Arabic behaviour, which I posted on the
> mailing list in November, and using your guidance, I'm preparing a
> suggestion for UTC.
>
> I think I know what should I suggest for shaping issues, but not about the
> following problem I am attaching below to help remembering.
>
> Do you think a purposal for changing the decomposition for U+0647 to my
> suggestion but without the ZWNJ may have a chance?

Frankly, no. Even without the suggestion of the ZWNJ, changing decompositions
already established in Unicode 3.0 impacts too many things, including
normalization.

> The current
> decomposition is really a bug, even in semantics. The semantics is really
> a Heh plus a Hamza Above.

For better or for worse, the Arabic encoding has multiple Heh's in it,
differentiated more by their shaping behavior than by their semantics
per se. This looks to me to just be another of shaping issues creeping
in to make it difficult to determine just what the base character should
be.

> The current decomposition has possibly been done
> only because of the glyph shape in the charts...

It was based on input from the Arabic experts on the UTC's Arabic
ad hoc committee. I would have to defer to them for their reasoning
behind deciding on U+06D5 as the base letter for U+06C0, rather
than U+0647, but I suspect it was consideration of shaping.

Note that prior to Unicode 3.0, U+06C0 had no decomposition mapping,
but the introduction of U+0654 ARABIC HAMZA ABOVE in Unicode 3.0 made
it necessary to consider provision of decompositions for all of the
letters that showed a hamza above another base letter.

--Ken

>
> --roozbeh
>
> On Tue, 21 Nov 2000, Kenneth Whistler wrote:
>
> > > My suggestion would be decomposing U+06C0 to
> > >
> > > U+0647 U+0654 U+200C
> > > <ARABIC LETTER HEH> <ARABIC HAMZA ABOVE> <ZERO WIDTH NON-JOINER>
> > >
> > > which seems to be the only solution for this. I again insist that this
> > > case appears really frequently in Persian, where HEH WITH YEH ABOVE is
> > > very common.
> >
> > Changing decompositions like this -- particularly to include a ZWNJ --
> > is not going to be possible, because of the implications for
> > normalization.
> >
> > Instead, the feasible way forward here is to write explicit exceptions
> > for Arabic shaping rules, to account for instances such as this one.
> > The shaping rules, unlike the decompositions, are not bound by
> > ironclad guarantees of no further changes.
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT