Re: Unicode Collation Algorithm

From: Mike ([email protected])
Date: Thu Apr 27 2006 - 13:53:04 CST

Next message: Mike: "Unicode Collation performance"

Previous message: Andreas Prilop: "Re: Unicode fonts"
In reply to: Richard Wordingham: "Re: Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>> I am implementing the UCA and am having trouble
>> passing the conformance test....
>
> The problem lies in the interpretation of 'combining mark'. I'd taken
> it to mean a character with non-zero combining class. Moreover, I think
> this is what was intended!

That was the problem. I modified my code to stop
trying to form contractions when a combining mark
of class 0 is encountered. Now it passes the
conformance tests (as long as I throw out level
four collation data in the NON_IGNORABLE test).

> I was able to get through the test - once I'd decided that unpaired
> surrogates should not be converted to the replacement character!

Well I had to ignore the tests with surrogates in
them. All my code deals in UTF-8 strings, so to
be conformant in UTF-8 processing, an exception is
raised when a surrogate (paired or not) is found.
I am comfortable with that.

> I think the rule should be amended by replacing 'combining mark' by
> 'character of non-zero combining class', but a more elegantly phrased
> alternative would be still better.

Yes, that would eliminate the confusion.

Mike

Next message: Mike: "Unicode Collation performance"
Previous message: Andreas Prilop: "Re: Unicode fonts"
In reply to: Richard Wordingham: "Re: Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 27 2006 - 13:57:47 CST