Re: Comparing Raw Values of the Age Property

From: Markus Scherer via Unicode <unicode_at_unicode.org>
Date: Mon, 22 May 2017 15:10:02 -0700

On Mon, May 22, 2017 at 2:44 PM, Richard Wordingham via Unicode <
unicode_at_unicode.org> wrote:

> Given two raw values of the Age property, defined in UCD file
> DerivedAge.txt, how is a computer program supposed to compare them?
> Apart from special handling for the value "Unassigned" and its short
> alias "NA", one used to be able to compare short values against short
> values and long values against long values by simple string
> comparison. However, now we are coming to Version 10.0 of Unicode,
> this no longer works - "1.1" < "10.0" < "2.0".
>

This is normal for numbers, and for multi-field version numbers.
If you want numeric sorting, then you need to either use a collator with
that option, or parse the versions into tuples of integers and sort those.

There are some possibilities - the values appear in order in
> PropertyValueAliases.txt and in DerivedAge.txt.

You should not rely on the order of values in data files, unless the file
explicitly states that order matters.

Can one rely on the FULL STOP being the field
> divider,

I think so. Dots are extremely common for version numbers. I see no reason
for Unicode to use something else.

and can one rely on there never being any grouping characters
> in the short values?

I don't know what "grouping characters" you have in mind.

I think the format is pretty self-evident.

markus
Received on Mon May 22 2017 - 17:10:24 CDT

This archive was generated by hypermail 2.2.0 : Mon May 22 2017 - 17:10:24 CDT