Re: Japanese text handling problem in Unicode Collation Algorithm

From: Asmus Freytag ([email protected])
Date: Tue Oct 13 2009 - 05:00:46 CDT

Next message: Erkki I. Kolehmainen: "[MEEK] Review of the CWA Draft till 9 November 2009 and the Final Meeting of the WS on 17 November 2009."

Previous message: Deborah Goldsmith: "Re: Japanese text handling problem in Unicode Collation Algorithm"
In reply to: Deborah Goldsmith: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Next in thread: Henrik Theiling: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 10/12/2009 6:38 PM, Deborah Goldsmith wrote:
>> However, even
>> for case differences, there are certainly lexical differences in
>> English (and other languages) where uppercase versus lowercase
>> are *not* optional, and do make systematic differences in meaning.
>> See, for example, German, where systematic uppercasing of nouns is
>> not optional, but a required aspect of spelling -- and where substituting
>> one character for the other would be considered simply wrong.
>>
>
> That’s true, but Japanese is the only language that uses kana, so it’s not clear why the DUCET behavior isn’t language-appropriate without tailoring. There may well be a reason, but it’s not obvious that it has to be the way it is.
>
>
So far, that sounds defensible as a principle. However, it would need to
be established that the results of a sorted listbox are incorrect.
Instead, the complaint is based on incorrect *search* behavior. This
points to a limitation of building a search algorithm directly on collation.

That is a problem that can occur for other languages. Distinction
ignored in sorting bring words into proximity based on their spelling,
not their meaning. In searching, by ignoring tertiary differences, the
result would not be proximity, but a match. As the current case
highlights, that can lead to false matches, which are annoying,
especially if they are patently false matches.

Collation may be a starting point to identify large classes of ignorable
differences for searching, but applying this without tailoring that is
specific to the search task is incorrect.

Ken pointed that out already, but it was apparently well hidden in his
reply.

A./

Next message: Erkki I. Kolehmainen: "[MEEK] Review of the CWA Draft till 9 November 2009 and the Final Meeting of the WS on 17 November 2009."
Previous message: Deborah Goldsmith: "Re: Japanese text handling problem in Unicode Collation Algorithm"
In reply to: Deborah Goldsmith: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Next in thread: Henrik Theiling: "Re: Japanese text handling problem in Unicode Collation Algorithm"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Oct 13 2009 - 05:42:12 CDT