Accumulated Feedback on PRI #196

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.


Source: Mark Davis
Date: 2011/10/24
  1. One of the links in UAX #38 is broken:
  2. The regex for kXerox; N/A;    ^[0-9]{3}:[0-9]{3} is inconsistent with the others; missing a final $.
    • However, I’d recommend that all of the regex patterns in UAX 38 remove leading ^ and trailing $; 
    • They are superfluous given that you need to match against the whole string, and just make the expressions even less readable.
  3. There needs to be a clear statement of whether the ordering of multivalued properties is significant, and if so, what that significance is. “Arbitrary” means you could store the values in a hashset and you wouldn’t lose any information. 
    1. There should be a statement at the top of UAX #38 that the ordering is arbitrary unless documented, and then clearly document those cases where it is not arbitrary:
    • kMandarin, kTotalStrokes, kHanyuPinyin.
    • kCantonese has a documented ordering, but that it is alphabetical, which is really arbitrary.
    • kHanyuPinlu also does, but because the number on each item gives the frequency, it is also arbitrary.
  4. Unihan (these might be longer term,...)
    • Should remove kCompatibilityVariant. It is just a subset of the data in UnicodeData.txt. So it is just an opportunity for error waiting to happen.
    • Should change kHanyuPinlu to accented pinyin instead of numeric, like the other pinyin fields.