Re: casefold o NFC = NFC o casefold?

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Tue Oct 30 2001 - 03:28:26 EST


-----BEGIN PGP SIGNED MESSAGE-----

Mark Davis wrote:
> Sadly, case mapping does not preserve any normalization formats under a
> character-by-character transformation. The simplest example is the string
> \u1FB2\u0300. A character-by-character titlecase conversion produces:
> \u1FBA\u0345\u0300.

\u1FB2\u0300 isn't NFC-normalised; its NFC-normalised form is \u1F70\u0345
(alpha-varia-ypogegrammeni). Also, note that I said case-folding, not
mapping to uppercase, lowercase or titlecase. However, I see your point -
this example does demonstrate that my conjecture is false:

  casefold(NFC("\u1FB2\u0300")) = casefold("\u1F70\u0345")
                                = "\u1F70\u03B9" (alpha-varia, iota)

  NFC(casefold("\u1FB2\u0300")) = NFC("\u1F70\u03B9\u0300")
                                = "\u1F70\u1F76" (alpha, iota-varia)

The intuitively correct result is the first one; the varia should
definitely be over the alpha, not the iota. I assume that's partly why
U+0345 (ypogegrammeni) has the highest combining class number. Doesn't
this imply that there should be a note in UTR #21 saying that NFC or
NFD normalisation should be done before case-folding?

"\u1F70\u03B9" is NFC-normalised, though, so that is not a contradiction
to case-folding preserving normalisation. OTOH, this is:

      casefold(NFC("\u00DF\u0301")) = casefold("\u00DF\u0301")
                                    = "ss\u0301" (s, s-acute)

  NFC(casefold(NFC("\u00DF\u0301")) = NFC("ss\u0301")
                                    = "s\u015B" (s, s-acute)

(not that "\u00DF\u0301" (eszett-acute) will occur in practice).

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO95kcTkCAxeYt5gVAQFDqAgAmy0lVM25r5RRcYxnBB22ySiNKCnzlGld
qwauowY0L3D3j7daEuGdJ+tnqDzTPKEQtWoUN9cdzcdOjen6bwaQc2/jIty2U2g0
oQByPUIFW+1oGzDbLhvphTAiXTnqrCuNV1TbjuV9FWNERMHxIfkU2D5QXMVHyhv5
3mcDpXcYD2FR0VJni7/M/Uc7sMkIttAqxH8htrF3SugW5qPoAmKyTtOqGBBBM7ZG
AsMQ6jKRzO+9GILLlao1p1/YwO2NpSrPfIaBB3wkxDavVOCIJmpDSvNxJaf4fvrw
N8ms2nAmpAmJSdm2GerUr0B75xnkFEiY4J/j3TEiOSgBgcmjz9k6Sw==
=xf62
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Tue Oct 30 2001 - 04:35:11 EST