From: Karl Williamson <public_at_khwilliamson.com>

Date: Mon, 30 Mar 2015 13:38:54 -0600

Date: Mon, 30 Mar 2015 13:38:54 -0600

On 03/29/2015 03:41 AM, Andrew West wrote:

*> On 28 March 2015 at 20:05, Karl Williamson <public_at_khwilliamson.com> wrote:
*

*>>
*

*>> Existing software that looks at the numeric values of characters is written
*

*>> expecting that rational numbers will have been reduced to their lowest form.
*

*>
*

*> That seems to be a rather rash statement. I have software (BabelPad)
*

*> which parses the numeric values of characters for numeric sorting
*

*> purposes, and it parses "6/12" for MEROITIC CURSIVE FRACTION SIX
*

*> TWELFTHS as 0.5. Personally I find it hard to imagine how you could
*

*> write software that accepts "6/12" as input and is unable to come up
*

*> with the answer of a half.
*

The statement is not rash, as it is simply a statement of objective

fact. I am the maintainer of software that fails with beta 8.0 due to

this change. And it has nothing to do with not being able to do

arithmetic division; your assumption was wrong.

The software essentially creates a database of Unicode properties for

regular expression pattern matching. so that someone can say

/\p{Numeric_Value=0.5}/

and quickly determine if the matched string contains a code point with

that characteristic. Because the database is copied as-is to many

different computers with different word sizes and different floating

point implementations, it can't do the division ahead of time because of

the inherent fuzziness of floating point numbers. It solves this the

same way Unicode has, by leaving rational numbers in their original

precisely specified format. Thus it creates a table for the

property-value combination of Numeric_Value and 1/2, taking the UCD

value as-is.

Prior to beta 8, the UCD came with all fractions already reduced. It

would not occur to someone with a mainly mathematical or computer

science background that the input data would come otherwise, as the

mathematical convention is to specify in irreducible terms, even though

this isn't promised by Unicode, so of course there is no code to handle

the new case. The code thus creates a second table for the

property-value combination of Numeric_Value and 6/12, which causes problems.

It's a small matter to add code to reduce the UCD-specified rational

numbers, but it's just one more complication to have to deal with along

with the many that the UCD already presents, and if there is not a good

reason the data for these new characters is specified contrary to

mathematical convention, then the data should be changed instead of

having to code around it.

*>
*

*> I would say that fractions should not be reduced to their lowest form
*

*> in the Unicode data as some people may need to order fractions by
*

*> numerator or denominator, and reducing to lowest form could break the
*

*> expectations of some software. Having said that, I note that the
*

*> numeric value of one character has been reduced in the Unicode data:
*

*> U+2189 VULGAR FRACTION ZERO THIRDS is given the numeric value of "0"
*

*> rather that "0/3".
*

So there is some precedent for reducing.

*>
*

*> Andrew
*

*> _______________________________________________
*

*> Unicode mailing list
*

*> Unicode_at_unicode.org
*

*> http://unicode.org/mailman/listinfo/unicode
*

*>
*

_______________________________________________

Unicode mailing list

Unicode_at_unicode.org

http://unicode.org/mailman/listinfo/unicode

Received on Mon Mar 30 2015 - 14:40:11 CDT

*
This archive was generated by hypermail 2.2.0
: Mon Mar 30 2015 - 14:40:12 CDT
*