From: Doug Ewell (email@example.com)
Date: Tue Oct 29 2002 - 12:04:26 EST
William Overington <WOverington at ngo dot globalnet dot co dot uk>
> I do note however that review 3 refers to a document which is only
> available to Unicode Consortium members, which seems a strange thing
> if views of interested individuals are being sought.
> Also, it is a pity that this new era of Unicode glasnost (displayed
> with a ligature? :-) ) comes so shortly after the last Unicode
> Technical Committee meeting the minutes of which state the consensus
> about no more ligatures being added to the U+FBxx block. Surely the
> matter of ligatures would be a good topic upon which to conduct such
> a public review.
No, it wouldn't, and here's why:
There is a concept in Unicode called "normalization" in which certain
characters or sequences are considered to be equal to other characters
or sequences for comparison purposes. Using this concept, a capital A
plus a combining acute accent (U+0041, U+0301) can be considered
equivalent to a precomposed A-with-acute (U+00C1). See Unicode Standard
Annex #15  for more information.
It's important to realize that the *whole reason* this mechanism exists
is because of the precomposed ligatures and letters-with-diacritic and
compatibility characters in Unicode. If there were only one way to
express the concept of "A with acute" in Unicode, there would be no need
Industry standards, such as the forthcoming Internationalized Domain
Name Architecture, depend on normalization to ensure that users don't
get unexpected mismatches between "A plus combining acute" and
"precomposed A-with-acute." And because these standards and their
implementations are built to specific versions of the Unicode Standard,
they require stability in the normalization process.
If a new precomposed ligature "character" were added to Unicode, there
would now be two ways of "spelling" a sequence that supposedly only had
one spelling. Let's suppose, JUST FOR ILLUSTRATION, that Unicode added
a "ct" ligature at U+FB07. Now there would be two ways of writing the
sequence "ct": with the regular Latin letters (U+0063, U+0074) or with
the ligature (U+FB07). But none of the existing normalization tables
would equate these two, because the ct ligature did not exist in the
(previous) version of Unicode that was used to create the normalization
table. Thus normalization would work in some cases but not others,
which would make the whole concept unstable and unpredictable and
That is why Unicode and WG2 have a policy  against adding new
precomposed ligatures and letters-with-diacritic, to the U+FBxx block or
anywhere else. They would break the stability of normalization, a
concept whose entire value lies in its stability. That is why the "ct"
ligature will not be added at U+FB07, and that is also why the National
Taitung Teachers College will not see their 42 precomposed Latin letters
added to Unicode. It is a good, sensible, well-thought-out policy that
will not benefit from public review.
Now, the Plane 14 language tag characters are a different matter
entirely. There the UTC proposes not to add something in violation of
its existing policy, but to formally discourage something that was just
added only a couple of years ago. I am actually arguing for *greater*
stability in the Unicode Standard, by arguing against the process of
adding and then immediately deprecating features like language tags.
(That is not my only argument for Plane 14, but it is one.)
This archive was generated by hypermail 2.1.5 : Tue Oct 29 2002 - 13:02:17 EST