L2/11-283

Date: Thu, 21 Jul 2011 11:24:00 -0400
From: Behdad Esfahbod <behdad@behdad.org>
Subject: Document more normalization invariants

Please add to UTC agenda.

I like to propose documenting the following normalization invariants, and
guaranteeing them in the stability policy.

1) The stability policy already says: "Canonical mappings
(Decomposition_Mapping property values) are always limited either to a single
value or to a pair. The second character in the pair cannot itself have a
canonical mapping."  However, these two properties are not well-documented in
UAX#15.  I believe they are worth documenting there.

2) The full canonical decomposition of a character does not expand to more
than four characters.  This is currently the case, but there is no guarantee
that it remains so.  Given that encoding a violation of this rule needs
encoding at least five characters, I'm fairly confident that such a mapping
will not be encoded in future versions, but if that is the consensus at UTC,
maybe it's worth documenting.  Note that there is such a guarantee for NFC
already (x3).  What I'm suggesting is to document max expansion for NFD to be x4.

3) The full compatibility decomposition of a character does not expand to more
than 18 characters.  Like previous case, if the consensus at UTC is that such
a decomposition is not acceptable for encoding anymore, can it be documented so?


If these cannot be coded in stone, maybe then can be added to "Invariants in
Implementations" section of TR44.

Cheers,
behdad