Re: casefold o NFC = NFC o casefold?

From: Mark Davis (mark@macchiato.com)
Date: Tue Oct 30 2001 - 09:58:43 EST


I did choose a bad example, but as you say, normalization is not preserved
in the way you wanted.

Yes, the reason the iota subscript has has a special value is to put it at
the end.

As to whether text should be normalized before or after casefolding (or
other case transformations) or both: I'd have to look at it in more detail.
It was not intended to be an invariant that case operations preserve NF*,
nor that case operations and NF* be commutative, although we may work
towards that end.

Mark
—————

Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης
[http://www.macchiato.com]

----- Original Message -----
From: "David Hopwood" <david.hopwood@zetnet.co.uk>
To: "Mark Davis" <mark@macchiato.com>; <unicode@unicode.org>
Sent: Tuesday, October 30, 2001 00:28
Subject: Re: casefold o NFC = NFC o casefold?

> -----BEGIN PGP SIGNED MESSAGE-----
>
> Mark Davis wrote:
> > Sadly, case mapping does not preserve any normalization formats under a
> > character-by-character transformation. The simplest example is the
string
> > \u1FB2\u0300. A character-by-character titlecase conversion produces:
> > \u1FBA\u0345\u0300.
>
> \u1FB2\u0300 isn't NFC-normalised; its NFC-normalised form is \u1F70\u0345
> (alpha-varia-ypogegrammeni). Also, note that I said case-folding, not
> mapping to uppercase, lowercase or titlecase. However, I see your point -
> this example does demonstrate that my conjecture is false:
>
> casefold(NFC("\u1FB2\u0300")) = casefold("\u1F70\u0345")
> = "\u1F70\u03B9" (alpha-varia, iota)
>
> NFC(casefold("\u1FB2\u0300")) = NFC("\u1F70\u03B9\u0300")
> = "\u1F70\u1F76" (alpha, iota-varia)
>
> The intuitively correct result is the first one; the varia should
> definitely be over the alpha, not the iota. I assume that's partly why
> U+0345 (ypogegrammeni) has the highest combining class number. Doesn't
> this imply that there should be a note in UTR #21 saying that NFC or
> NFD normalisation should be done before case-folding?
>
> "\u1F70\u03B9" is NFC-normalised, though, so that is not a contradiction
> to case-folding preserving normalisation. OTOH, this is:
>
> casefold(NFC("\u00DF\u0301")) = casefold("\u00DF\u0301")
> = "ss\u0301" (s, s-acute)
>
> NFC(casefold(NFC("\u00DF\u0301")) = NFC("ss\u0301")
> = "s\u015B" (s, s-acute)
>
> (not that "\u00DF\u0301" (eszett-acute) will occur in practice).
>
> - --
> David Hopwood <david.hopwood@zetnet.co.uk>
>
> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
> RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
> Nothing in this message is intended to be legally binding. If I revoke a
> public key but refuse to specify why, it is because the private key has
been
> seized under the Regulation of Investigatory Powers Act; see
www.fipr.org/rip
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: 2.6.3i
> Charset: noconv
>
> iQEVAwUBO95kcTkCAxeYt5gVAQFDqAgAmy0lVM25r5RRcYxnBB22ySiNKCnzlGld
> qwauowY0L3D3j7daEuGdJ+tnqDzTPKEQtWoUN9cdzcdOjen6bwaQc2/jIty2U2g0
> oQByPUIFW+1oGzDbLhvphTAiXTnqrCuNV1TbjuV9FWNERMHxIfkU2D5QXMVHyhv5
> 3mcDpXcYD2FR0VJni7/M/Uc7sMkIttAqxH8htrF3SugW5qPoAmKyTtOqGBBBM7ZG
> AsMQ6jKRzO+9GILLlao1p1/YwO2NpSrPfIaBB3wkxDavVOCIJmpDSvNxJaf4fvrw
> N8ms2nAmpAmJSdm2GerUr0B75xnkFEiY4J/j3TEiOSgBgcmjz9k6Sw==
> =xf62
> -----END PGP SIGNATURE-----
>
>



This archive was generated by hypermail 2.1.2 : Tue Oct 30 2001 - 10:41:28 EST