Date: Mon Jul 27 2009 - 12:59:28 CDT

    Eric Muller wrote:
    > karl williamson wrote:
    >> I'm trying to come up with an alias to propose to the UCT for the
    >> misleadingly named Age property. People tend to think from the name
    >> that Age=3.2 means that the code point dates to version 3.2, when in
    >> fact it means it dates to at least 3.2.
    > I am not entirely what distinction you make, but Age=3.2 means that the
    > character was present in version 3.2 and in no earlier version.
    > Eric.

    Apparently that is what Asmus and others think as well, and it certainly
    is the data that comes in DerivedAge.txt, and if that were truly the
    case, I wouldn't have any problem with the term "Age". But let me quote
    from the header of that file:
    # Caution: When using the Age *property*, all assigned code points
    # in each version are included, not just the newly assigned code points.
    # For more information, see

    And, if you look at tr18, it says:

    Caution: The DerivedAge data file in the UCD provides the deltas between
    versions, for compactness. However, when using the property all
    characters included in that version are included. Thus \p{age=3.0}
    includes the letter a, which was included in Unicode 1.0. To get
    characters that are new in a particular version, subtract off the
    previous version as described in 1.3 Subtraction and Intersection. For
    example: [\p{age=3.1} -- \p{age=3.0}]

    So either you guys are wrong, or the documentation is wrong in at least
    two places. I have to assume that the documentation is right until
    shown otherwise; and if it is correct, I think that proves my point. If
    experienced people who work with Unicode all the time don't understand
    what this property is, then something is wrong, and at a minimum a new
    alias is needed to clarify things.

    I also don't think that in these days of abundant cheap storage that the
    Consortium should be worrying about compactness. I believe every
    property that is exposed in the UCD should have a fully derived version
    available, probably in the extracted directory. In 5.2 Beta, the only
    properties and property values that the user has to derive (except for
    defaults) are Age, gc=LC, gc=C, gc=L gc=M, gc=N, gc=P, gc=S, and gc=Z.
    There should be files in the extracted directory that show the derived
    values for all of them. There are bound to be mistakes made when
    programmers re-derive them; and there is duplicated work as well. This
    Age property is a case in point. I wonder how many implementations
    there are out there that have it wrong.

    Unicode has made mistakes in the past with the UCD (the 4 code points
    that were Attached_Below_Left instead of Attached_Below in one of the
    Version 3 releases, and the incomplete DerivedLineBreak.txt which was
    missing H3 in 4.1 spring to my mind), but at least it is subjected to
    public review, and I would hope that the discipline of having to get it
    to work under XML would catch most errors. (I did, though, find some
    omissions in the 5.2 Beta PropertyValueAliases.txt file.)

