Error in definition of "compatibility character"?

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Fri Oct 26 2001 - 05:12:34 EDT


-----BEGIN PGP SIGNED MESSAGE-----

Clauses D20 and D21 of the Unicode Standard (3.0 or 3.1) read:

# D20 Compatibility decomposition: the decomposition of a character that
# results from recursively applying /both/ the compatibility /and/ the
# canonical mappings found in the names list of /Section 14.1,
# Character Names List/, and those described in /Section 3.11,
# Conjoining Jamo Behavior/, until no characters can be further
# decomposed, and then reordering nonspacing marks according to
# /Section 3.10, Canonical Ordering Behavior/.
#
# - A compatibility decomposition may remove formatting information.
#
# D21 Compatibility character: a character that has a compatibility
# decomposition.
#
# - Compatibility characters are included in the Unicode Standard to
# represent distinctions in other base standards. They support
# transmission and processing of legacy data. Their use is discouraged
# other than for legacy data.
# - Replacing a compatibility character by its decomposiiton may lose
# round-trip convertibility with a base standard.

By definition D20, if a character has a canonical decomposition, then
it also has a compatibility decomposition. This is correct, because
NFKD includes all the decompositions that NFD does.
The problem is with D21: if all characters that have a canonical
decomposition also have a compatibility decomposition, then all of
these are compatibility characters. Clearly that wasn't what was
intended, and it is inconsistent with the following two bullet points.

I think the correct definition of a compatibility character is a
character with a compatibility decomposition that differs from its
canonical decomposition (i.e. NFKC(c) != NFC(c)). Am I right?

(Note that it wouldn't be correct to define a compatibility character
simply as a character that has "<...> ..." entry in the decomposition
field of the UCD; a counterexample is U+03D3.)

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO9kOszkCAxeYt5gVAQEvfAgAhPW+uauuxRArxCWPJgYBW54AvAdg3yxB
iATHjKED/4s+KkfMGP6kq3RzZpgD21MpeOacIG4+NWkgd8wHMRAvNWc2n+PEU+KJ
A3Ngf/vDV+JZxhDX09s6lSxagfkQDhxB/bzgGMzpyCUdJshgiBsnTd4C8/IXbzgR
KNi9XeZ+jEGYV+24S9stnMClmV/xMI9FR2QV2mA72Li5AgFR/DoRxSaeV4XiMw+3
RTJP5gVSQeUv1TsXD4X8J3z0YzxiFFzwPlIbG3o1BOcwjPrROmV0ULJQM1ufemGi
Q/VJrkvPPyxibcOAk8Vb6LtA+jyyoi9TAod3JcLWDsEiIq1bfbcBKw==
=tQg1
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Fri Oct 26 2001 - 06:27:07 EDT