Re: Comparing Raw Values of the Age Property

From: Ken Whistler via Unicode <unicode_at_unicode.org>
Date: Tue, 23 May 2017 17:44:49 -0700

Richard

On 5/23/2017 1:48 PM, Richard Wordingham via Unicode wrote:
> The object is to generate code*now* that, up to say Unicode Version 23.0,
> can work out, from the UCD files DerivedAge.txt and
> PropertyValueAliases.txt, whether an arbitrary code point was included
> by some Unicode version identified by a Unicode version identified by a
> value of the property Age.

Ah, but keep in mind, if projecting out to Version 23.0 (in the year
2030, by our current schedule), there is a significant chance that
particular UCD data files may have morphed into something entirely
different. Recall how at one point Unihan.txt morphed into Unihan.zip
with multiple subpart files. Even though the maintainers of the UCD data
files do our best to maintain them to be as stable as possible, their
content and sometimes their formats do morph gradually from release to
release. Just don't expect *any* parser to be completely forward proofed
against what *might* happen in the UCD in some future version.

On the other hand, for the property Age, even in the absence of
normative definitions of invariants for the property values, given
recent practice, it is pretty damn safe to assume:

A. Major versions will continue to have two digits, incremented by one
for each subsequent version: 10, 11, 12, ... 99.
B. Minor versions will mostly (if not entirely) consist of the value
"0", and will never require two digits.

Assumption A will get you through this century, which by my estimation
should well exceed the lifetime of any code you might be writing now
that depends on it.

BTW, unlike many actual products, the version numbering of the Unicode
Standard is not really driven by marketing concerns. So there is very
little chance of some version sequence for Unicode that ends up fitting
a pattern like: 3.0, 3.1, 95 or NT, 98, 2000, XP, Vista, 7, 8, 8.1, 10
... ;-)

> What TUS 9.0, its appendices and annexes is lacking is a clear
> statement such as, "The short values for the Age property are of the
> form "m.n", with the first field corresponding to the major version,
> and the second field corresponding to the minor version. There is no
> need for a third version field, because new characters are never
> assigned in update versions of the standard."

I think the UTC and the editors had just been assuming that the pattern
was so obvious that it needed no explaining. But the lack of a clear
description of Age had become apparent, which is why I wrote that text
to add to UAX #44 for the upcoming version.

> Conveniently, this
> almost true statement is included in Section 5.14 of the proposed
> update to UAX#44 (in Draft 12 to be precise. It's not quite true, for
> there is also the short value NA for Unassigned. Is there any way of
> formally recording this oversight?

Yes. You could always file another piece of feedback using the contact
form. However, in this case, you already have the attention of the
editors of UAX #44. So my advice would be to simply wait now for the
publication of Version 10.0 of UAX #44 around the 3rd week of June.

--Ken
Received on Tue May 23 2017 - 19:45:41 CDT

This archive was generated by hypermail 2.2.0 : Tue May 23 2017 - 19:45:42 CDT