L2/04-251 Date/Time: Tue Jun 15 21:08:48 CDT 2004 Contact: w3c-i18n-ig@w3.org Report Type: Public Review Issue Opt Subject: [UTR#23] Public review feedback from W3C I18N WG Here are the comments on this Draft UTR (#23, Character Properties). Some of these were reviewed by the working group (indicated). The remainder are the work of the reviewer, Addison Phillips. 1. In section 3, there is this new sentence: "The definition numbers in this document will be updated when new definitions are added." It would be better if the existing numbers were maintained. Renumbering makes it hard to reference the document in a stable way. 1a. In fact, there is an error in the numbering. Search for PD27 and note that you will be looking at "property value alias". Scroll down two or three items and PD27 repeats---as the first item in "string functions". 2. The definition of "stable property" applies only to "code point properties" under the current definition. I don't think this is intended. 3. The section "String Functions" (sorta PD27 - PD32) uses what CharMod and TUS 4.0 refer to as a "code unit string", but the encoding form is never specified. Some of the definitions in this section, such as PD28 don't make a lot of sense when talking about UTF-8 as an encoding form. Other definitions, such as PD29 appear to take UTF-8 into account. Two comments: a. It isn't clear why code unit strings are needed in the discussion of some properties, especially since the document doesn't spell out how different encoding forms have any relevence. The only definition that seems to rely on code unit strings is PD37. b. If string functions based on code unit strings need to be maintained (for historical, if not other reasons), then encoding forms should be mentioned and the details made clear in the document. 4. The document as written requires subtantial familiarity with Unicode and its concepts in order to understand it. This may impair the document's accessibility to non-Unicadetti. What follows are Addison's personal notes, mostly of an editorial (non-substantive) nature, although there are a few minor notes of substance: --- Section 1: the word "typology" is obscure. --- S1: The added paragraph starting "In some ways, the model of character properties presented here..." is unclear. If the goal of this document is to become a standard annex, then this text is superfluous. If this is not a UTS or UAX, then this text is inappropriate or its intent should be made clearer. --- S1: The sentence: "This report specifically covers formal character properties, which are those attributes of characters that are specified according to the definitions set forth in this report." is too reflexive.... "This report covers the stuff defined in this report". Consider rewriting. Why is the word "formal" necessary? Are there informal character properties? --- S2: The overview mixes "character", "code-point", and "character string" together. This seems inexact. At its most basic a character property relates a character (sometimes a code point) to a value. At its most general, a character property can be considered a function; it is a mapping from characters or a character (or code-point) string to a property value. I understand that "code point" is here because individual surrogates and unassigned code points aren't technically characters, but the document doesn't make this point anywhere and this leaves the reader wondering. Could it say: At its most basic a character property relates a character to a value. At its most general, a character property can be considered a function that maps code points to specific property values. --- S2.1, p3: delete extra space between Unicode and Standard --- S2.2, p3: add "s": cultural expectation*S* of the user. --- S2.4, p6: replace "e.g." with "for example". Okay, I'm being a pedant now... --- S2.4 note: The text implies that confomance to Unicode includes a *complete* character database with all properties. I don't think that is what is intended. Instead, it should say something more like "For normative properties that are exposed by a conformant implementation..." Note: One trivial, but important instance of conformant implementation is runtime access to a character property database. For normative properties, conformant implementations guarantee that the returned values match the values defined by the Unicode Consortium. --- S2.6, note: remove comma after "at any time". Also, there appears to be a parenthetical (cc) that should be removed. --- PD10: the example for catalog property is unclear. The text implies that characters may acquire new properties. It should say something more like: Examples are the age and block properties. Additional property values may be created each time a new version of the Standard is issued that adds new characters or blocks. --- PD10: remove comma after "that" in definition. --- PD12: the note is very chewy (also, it has an extra comma after property) Note: A normative process that depends in a normative and testable way on a property, is usually sufficient reason to designate a property as normative. For example, the interpretation of the bidirectional class is precisely defined in [Bidi]. Suggest: Note: Normative properties are generally defined because a normative process depends upon them in a normative and testable way. For example, the interpretation of the bidirectional class is precisely defined in [Bidi]. --- PD18: extra stuff at the end? -- PD19: the example is difficult to understand because not enough of it is exposed by the text. The example is the canonical combining class values. I know that these are numbers and the relative values between two code points will always be the same (e.g. if charA > charB now, then charA>charB forever), the actual numbers may change, but it doesn't say this. Users have to know a lot to understand the example. Consider a small note to explain this. --- PD20: note has comma instead of a period -- PD21, note: the text mixes the use of "fixed" and "immutable" carelessly. The text should use only immutable here, since that's the name of the thing. --- PD22: the note should say "immutable property" instead of fixed. Also, show an example. --- PD34: The text boundary special case seems artificial. Spell out why this is a special case, although I suspect that you could stop with "Any text boundary function is by definition context-sensitive." --- PD35: idempotent remove "that the output of a function is a string,", which isn't part of the definition of idempotent. What's important is the recursive use part... --- PD 36: be consistent and capitalize Count. --- S4.3, last sentence: remove comma after "it" --- S5.1: suggest striking "in Thailand", as there are minority scripts elsewhere with similar documentation problems... --- S5.1: strike "successful" from "in their effects for existing successful implementations." One presumes a correct implementation is successful... --- S5.1: correct "maded" in "character property is maded to prevent" --- S5.1: something is wrong with this sentence: "Occasionally, a change to a character property is maded to prevent incorrect generalizations of a use of character based on its nominal property values. " Is "a" missing before character? --- S6.1: "the characters to which the property does not apply *are*" I should point out that "implemented as a partition" in this sentence is unclear. ---