5.2 TR44 and Age property

From: karl williamson (public@khwilliamson.com)
Date: Wed Sep 30 2009 - 11:47:30 CDT


I was disappointed to see that the apparently final 5.2 version of tr44
adds the same misleading phrase that Ken Whistler agreed was a problem
in tr18, and that is:

"Note: When using the Age property in regular expressions, an expression
such as "\p{age=3.0}" matches all of the code points assigned in Version
3.0—that is, all the code points with a value less than or equal to 3.0
for the Age property. For more information, see [UTS18]. "

And here is the relevant portions of his talking about this: from
http://unicode.org/mail-arch/unicode-ml/y2009-m07/0107.html

>>>
>>> The documentation is wrong in two places -- or at least
>>> misleading. Note that it doesn't actually say the property
>>> is *defined* thus and such, but rather that "when using the
>>> property all characters included in that version are included."
>>> That amounts to a pocket definition of a new derived property
>>> (or actually set of properties) based on the use of the Age property
>>> per se.
>>>
>>> This is one of these cases where an insufficiently carefully
>>> documented property is trying to have it both ways.
>>>
>>
>> [snip]
>>
>>> Age is an enumerated property in the UCD. Among other things, that
>>> means that its values constitute a codespace partition. Each
>>> code point has one and and only one value of the property. Both
>>> the values in DerivedAge.txt and in the XML data files reflect
>>> that interpretation.
>>>
>>> The property defined that way is not, however, as useful as the
>>> property described the way it is used for regex matches in UTS #18,
>>> because it is far more useful for regex matches to know if a
>>> character is included in Unicode Version X (or any *earlier*
>>> version), rather than to know if it was encoded exactly in
>>> Version X. So the usage of the Age property in UTS #18 just
>>> blithely assumes that interpretation, and the caution at the
>>> top of DerivedAge.txt reflects that interpretation, even though
>>> it is in direct contradiction with the data itself.
>>>
>>> [snip]
>>>
>>> There is definitely need for clarification here.
>>>
>> [snip]

And in the previous email on the same thread:
>> Further explanation should also be useful for UTS #18, the
>> next time it is opened for updates.

So, I'm disappointed that tr44 was updated to include a known misleading
statement. Note that these emails dated from July.

And, I continue to maintain that the use of the Age property in this way
is confusing. It no longer means age, it means something else, like
perhaps Present_In. Calling something what it isn't leads to obfuscated
code, something that shouldn't be encouraged by TUS. I believe that
implementations should implement Age as it is actually defined, not
using a pocket definition. If they want to implement the more useful
concept, it would be better to call it something else that accurately
indicates what it actually is.



This archive was generated by hypermail 2.1.5 : Wed Sep 30 2009 - 11:52:11 CDT