Re: casefold o NFC = NFC o casefold?

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Wed Oct 31 2001 - 03:09:08 EST


-----BEGIN PGP SIGNED MESSAGE-----

Mark Davis wrote:
> I did choose a bad example, but as you say, normalization is not preserved
> in the way you wanted.
>
> Yes, the reason the iota subscript has has a special value is to put it at
> the end.
>
> As to whether text should be normalized before or after casefolding (or
> other case transformations) or both: I'd have to look at it in more detail.
> It was not intended to be an invariant that case operations preserve NF*,
> nor that case operations and NF* be commutative, although we may work
> towards that end.

The issue is that two canonically equivalent inputs to case folding do
not always produce two canonically equivalent outputs. (I.e. not just that
the output isn't necessarily normalised, which in itself wouldn't be much
of a problem.)

This seems to be contrary to the intention of conformance clause C9
("... Ideally, an implementation would always interpret two canonical-
equivalent character sequences identically. ...").

I think this can only happen if the input contains characters with
ypogegrammeni in their decomposition, because it is the fact that a
combining character maps to a non-combining character that causes the
difficulty - the combining marks after the ypogegrammeni effectively
get split off into the next character.

BTW, I haven't found any obvious counterexamples to

  casefold(NFD(x)) = NFD(casefold(NFD(x)))

This can't fail for the same reason that it fails for NFC (i.e. that there
may be a precomposed form for a character output by folding that does not
exist for the input character). I'm not sure whether it is true in all
cases, though.

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO9+wlzkCAxeYt5gVAQEeIQf/fgEhgIsybT+Di8bPDMW0aX8K7LQZHRnt
TI8hJ9hOBtDH2bz8TryyJHEUPie+3h5oLZBqS3aRcXWzNl9GZkWquLWU6BwRv0Be
Y0TyHvbzTKTzH7kIWcxiJrjDS21K1aaofP49glqg54AQHv6F5vgb47gEQq3SQxGy
ND/9BKQsR+T20g5MKKQalUxotQuJRPXaZCf6wHiJjrCxogSRMnV99g5PGuyt2mo4
l+MRJS8LCr+q+2A8kVWWqZUt/6dN1NAa7GVNKJJEU+0Sxu7tFhSoav8Ih5XYMKkB
u4y4RnRZ+foPX5W182iIsZjJ9fbjPHOsRB+ZTcEQvRI7Xk1vr+BWuQ==
=cpYV
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Wed Oct 31 2001 - 04:29:20 EST