Re: casefold o NFC = NFC o casefold?

From: Mark Davis (mark@macchiato.com)
Date: Tue Oct 30 2001 - 00:30:40 EST


Sadly, case mapping does not preserve any normalization formats under a
character-by-character transformation. The simplest example is the string
\u1FB2\u0300. A character-by-character titlecase conversion produces:
\u1FBA\u0345\u0300. This, however, is not in canonical order, and is thus
not normalized under any of the forms.

The problem is that while the results of each character's transformation may
be in a canonical format, the transformation of a string will at least
require reordering -- and may require more, in the case of NFC or NFKC.

Mark

—————

Δός μοι ποῦ στῶ, καὶ κινῶ τὴν γῆν — Ἀρχιμήδης
[http://www.macchiato.com]

----- Original Message -----
From: "David Hopwood" <david.hopwood@zetnet.co.uk>
To: <unicode@unicode.org>
Sent: Sunday, October 28, 2001 18:40
Subject: casefold o NFC = NFC o casefold?

> -----BEGIN PGP SIGNED MESSAGE-----
>
> AFAICS, all of the case folding mappings in Unicode 3.1 map to a
> string that is normalised according to NFC or NFD in the same way
> as the input character, don't map to a different combining class,
> and operate uniformly on canonical equivalents. Also any combining
> characters following a character that is mapped will stay in the
> same order. Therefore, I conjecture that:
>
> casefold(NFC(x)) = NFC(casefold(x))
> casefold(NFD(x)) = NFD(casefold(x))
>
> for all of the case folding types and all strings x. However,
> CaseFolding-4.txt says, "NOTE: case folding does not preserve
> normalization formats!"
>
> It clearly doesn't preserve NFKC or KFKD (U+20A8 RUPEE SIGN demonstrates
> how this can fail), but if it does preserve NFC and NFD, shouldn't that
> be clarified? Or have I missed something, and there is some reason why
> it doesn't preserve NFC/NFD?
>
> - --
> David Hopwood <david.hopwood@zetnet.co.uk>
>
> Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
> RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
> Nothing in this message is intended to be legally binding. If I revoke a
> public key but refuse to specify why, it is because the private key has
been
> seized under the Regulation of Investigatory Powers Act; see
www.fipr.org/rip
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: 2.6.3i
> Charset: noconv
>
> iQEVAwUBO9zBHDkCAxeYt5gVAQHbPQgApl2UZe0JlQHYzJB5C8V6RK2UvF3LXIHx
> /MAxM6KjmkZozWrb9nkMA0NxT3BpehPMrJOUM49xRoLvwcFzLK5vwOwVyOVpZZXw
> rqa2CmymDAJfvCoCEH4nOGXqLwezZdVFQuHgJsxAaUTIQWLd5bcwB8dCB5VEd4Yz
> J8Qrx53q3p5+TO+jWk2dgUeOhr3G5W2Kf6zka8NeFGlL5HhTpwT2agQM5vii14/s
> JnC8Ka+HNbkDdPLNio6iOR44h46MLrr8jyO4oGWk/SX16KT5ATUxdNNjCD7jjrPF
> mGhIQywClkIjgtQy4JYWhyxrQknvMih2f/ausbRn4ldf9FBtk33DNQ==
> =6tSm
> -----END PGP SIGNATURE-----
>
>



This archive was generated by hypermail 2.1.2 : Tue Oct 30 2001 - 01:21:41 EST