L2/12-075 Subject: Overridable Properties From: Mark Davis Date: 2012-02-06 We have the following text in the standard: "Thus the decomposition of Unicode characters is both normative and not overridable; no higher-level protocol may override these values, because to do so would result in non-interoperable results for the normalization of Unicode text. Other normative properties, such as case mapping, are overridable by higher-level protocols, because their intent is to provide a common basis for behavior. Nevertheless, they may require tailoring for particular local cultural conventions or particular implementations. D34 Overridable property: A normative property whose values may be overridden by conformant higher-level protocols. • For example, the Canonical_Decomposition property is not overridable. The Uppercase property can be overridden." However, we do not say which properties are overridable and which are not (except for that one example)! (See also http://www.unicode.org/L2/L2012/12041-properties.html) Moreover, the more I think about this, the less sense the whole notion seems to make. A. If an implementation purports to return a particular UCD property value for a code point, and doesn't, I think we want to say it is non-conformant; no matter what the B. We can't forbid an implementation, however, from having its own property which is (say) based on Unicode properties, but differs for (say) private use code points. I can have a getXGeneralCategory(codepoint) function, for example. C. The one area we have to be careful of is properties that are used to define conformant behavior (such as the toNFKD operation). But even there, the important feature that we always stress is that the results have to match, not the internal operations. So if I even had an XCanonical_Combining_Class property that returned the negative of the Canonical_Combining_Class, and my implementation of normalization had an algorithm with the corresponding changes, I could have perfectly conformant implementation of normalization. So I think we should consider the following proposal: a) Retract the definition of Overridable property. b) Add a conformance clause along the following lines Cxx. For a given version of Unicode, any implementation that purports to return the value for a UCD property for a code point must return the value specified by the UCD data files for that version, unaltered. * An implementation may return an altered value if the modification is clearly documented. It is strongly recommended that any such implementation use a different name, and also supply a mechanism to access the unmodified property values. For example, getXGeneralCategory() returns the UCD General_Category values except that the private use range from E000 to E100 return the value Other_Letter (Lo). c) Make sure that C18 and following text makes it clear that the results of the Unicode algorithms (and the constraints on tailorability, if any) are specified in terms of unmodified UCD properties.