L2/06-086 From: Mark Davis Date: 2006-03-17 Subject: Preferred ordering of marks Please add the following as a document and on the agenda. Peter Constable recently proposed documenting ordering information for Thai, ie "BelowMarks* AboveMarks*". That is the mechanism we are using right now whenever the customary typing order is different than the NFC ordering: using a BNF to indicate the ordering. But this raises a broader issue. We have been accumulating bits of documentation in various places about what the preferred ordering is for a given script. But having it in documentation *alone* means that inevitably programmers will overlook it in their implementations. Here is a strawman proposal for post 5.0: Define a new numeric property called 'Preferred_Ordering'. This assigns a number, similar to the canonical combining class (CCC), to each Unicode character. The preferred order for a Unicode string is found by applying the canonical ordering algorithm but using this property instead of CCC. The goal is to match the most common typing order for complex scripts, for the sequences that are used in practice in a given script. As opposed to the CCC, this property will never be required to be stable; it can be adjusted as new information comes in. ISSUES 1. Is this algorithm adequate? If a script had some rules with repeats, like the following BNF, one would need a more complicated algorithm. * * * * 2. Is the customary preferred typing order dependent on language? For example, if the Thai preferred ordering is as above, but a minority language using the Thai script had the reverse, then the ordering would be language-dependent. If so, then the information would be more properly part of CLDR instead of the UCD.