Re: Error in definition of "compatibility character"?

From: Mark Davis (mark@macchiato.com)
Date: Fri Oct 26 2001 - 11:40:55 EDT


Yes, that is correct. A compatibility character (in the sense of D21) is a
character that has a different compatibility decomposition (different than
the chracter itself). We should fix the definition.

Note: there are two different senses of "compatibility character": a
"decomposible compatibility character" (D21) and a "legacy compatibility
character" (one that would not have been added to Unicode but for
compatibility with a pre-existing standard). These two sets of characters
are not the same.

Mark
—————

Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης
[http://www.macchiato.com]

----- Original Message -----
From: "David Hopwood" <david.hopwood@zetnet.co.uk>
To: <unicode@unicode.org>
Sent: Friday, October 26, 2001 02:12
Subject: Error in definition of "compatibility character"?

> -----BEGIN PGP SIGNED MESSAGE-----
>
> Clauses D20 and D21 of the Unicode Standard (3.0 or 3.1) read:
>
> # D20 Compatibility decomposition: the decomposition of a character that
> # results from recursively applying /both/ the compatibility /and/ the
> # canonical mappings found in the names list of /Section 14.1,
> # Character Names List/, and those described in /Section 3.11,
> # Conjoining Jamo Behavior/, until no characters can be further
> # decomposed, and then reordering nonspacing marks according to
> # /Section 3.10, Canonical Ordering Behavior/.
> #
> # - A compatibility decomposition may remove formatting information.
> #
> # D21 Compatibility character: a character that has a compatibility
> # decomposition.
> #
> # - Compatibility characters are included in the Unicode Standard to
> # represent distinctions in other base standards. They support
> # transmission and processing of legacy data. Their use is discouraged
> # other than for legacy data.
> # - Replacing a compatibility character by its decomposiiton may lose
> # round-trip convertibility with a base standard.
>
> By definition D20, if a character has a canonical decomposition, then
> it also has a compatibility decomposition. This is correct, because
> NFKD includes all the decompositions that NFD does.
> The problem is with D21: if all characters that have a canonical
> decomposition also have a compatibility decomposition, then all of
> these are compatibility characters. Clearly that wasn't what was
> intended, and it is inconsistent with the following two bullet points.
>
> I think the correct definition of a compatibility character is a
> character with a compatibility decomposition that differs from its
> canonical decomposition (i.e. NFKC(c) != NFC(c)). Am I right?
>
>
> (Note that it wouldn't be correct to define a compatibility character
> simply as a character that has "<...> ..." entry in the decomposition
> field of the UCD; a counterexample is U+03D3.)
>
> - --
> David Hopwood <david.hopwood@zetnet.co.uk>
>
> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
> RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
> Nothing in this message is intended to be legally binding. If I revoke a
> public key but refuse to specify why, it is because the private key has
been
> seized under the Regulation of Investigatory Powers Act; see
www.fipr.org/rip
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: 2.6.3i
> Charset: noconv
>
> iQEVAwUBO9kOszkCAxeYt5gVAQEvfAgAhPW+uauuxRArxCWPJgYBW54AvAdg3yxB
> iATHjKED/4s+KkfMGP6kq3RzZpgD21MpeOacIG4+NWkgd8wHMRAvNWc2n+PEU+KJ
> A3Ngf/vDV+JZxhDX09s6lSxagfkQDhxB/bzgGMzpyCUdJshgiBsnTd4C8/IXbzgR
> KNi9XeZ+jEGYV+24S9stnMClmV/xMI9FR2QV2mA72Li5AgFR/DoRxSaeV4XiMw+3
> RTJP5gVSQeUv1TsXD4X8J3z0YzxiFFzwPlIbG3o1BOcwjPrROmV0ULJQM1ufemGi
> Q/VJrkvPPyxibcOAk8Vb6LtA+jyyoi9TAod3JcLWDsEiIq1bfbcBKw==
> =tQg1
> -----END PGP SIGNATURE-----
>
>



This archive was generated by hypermail 2.1.2 : Fri Oct 26 2001 - 13:05:45 EDT