Re: Digraphs as Distinct Logical Units

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Aug 08 2002 - 14:38:56 EDT


Roozbeh asked:

> On Thu, 8 Aug 2002, Michael Everson wrote:
>
> > [...] and it seemed prudent to WG2 and to the UTC to encode it.
>
> But has UTC decided on this? I guess you mean UTC people present in
> Dublin, and not UTC itself.

That is to be decided at the UTC meeting later this month in Seattle.
(See http://www.unicode.org/unicode/alloc/Pipeline.html)
It will be taken up along with a number of other characters
marked "N/A" in the Unicode pipeline document, as part of the
regular consideration by UTC for synchronizing character acceptances
with activity by WG2.

>
> I also can't understand why there won't a be compatiblity decomposition
> for this. In soul and essence, it is just like U+FDFA ARABIC LIGATURE
> SALLALLAHOU ALAYHE WASALLAM and U+FDFB ARABIC LIGATURE JALLAJALALOUHOU.

That is true, but for those ligatures as well, the compatibility
decomposition is not actually useful in implementation. No one
expects people to actually type out the decomposition in order to
get the symbol as a "character". And as Michael Everson pointed out,
the expected calligraphic form of the complete symbol is not likely
to be supported by a standard Arabic font -- it is expected to have
a certain defined shape, rather than simply being formed from the
pieces of the Arabic font one happens to type it in.

Doug asked:

> I'm curious about this too. Is it for transcoding with a character set
> that was in existence in the early '90s, but not considered until now?
> I thought that was the criterion.

Yes, it is for transcoding -- with a Pakistani standard for Urdu.

In principle, the cutoff data for compatibility encoding was
the early 1990's, but in practice that ideal is not actually
maintainable. The Unicode Standard has encoded many characters
for compatibility with other important national standards -- the
mere existence of the Unicode Standard as an alternative doesn't
stop national standards bodies from continuing to standardize
on their own. Important examples: GBK (later GB18030) in China,
and JIS X 0213 in Japan. You'll find many characters in Unicode
that got there for compatibility with those two (recent) standards.

--Ken



This archive was generated by hypermail 2.1.2 : Thu Aug 08 2002 - 13:02:41 EDT