From: Asmus Freytag (firstname.lastname@example.org)
Date: Fri Mar 12 2010 - 17:20:53 CST
On 3/11/2010 10:12 PM, karl williamson wrote:
> Andrew West wrote:
>> On 11 March 2010 20:32, karl williamson <email@example.com> wrote:
>>> I think it is actually better to do the following:
>>> 1. Remove all white space
>>> 2. Collapse multiple hyphens in a row into one
>>> 3. Lowercase
>>> 4. If the result is one of the three problematic ones, we are done.
>>> 5. Remove all hyphens
>>> Then, if the strings are the same after the transforms, they match.
>> No, then "TIBETAN MARK TSA PHRU" would match "TIBETAN MARK TSA -PHRU",
>> which may be what the user intended, but it is not what they asked
>> for, and would be as bad as matching e.g. "PERCENT IGN" and "PERCENT
> OK, but that is a change from what TR18 says: "names should use a
> loose match, disregarding case, spaces and hyphen" except for the
> three problematic situations it mentions. There is no character
> TIBETAN MARK TSA PHRU,
But it's a name that could be added to the standard at any moment,
because it would be formally distinct from any existing
TIBETAN MARK TSA -PHRU
so you can't simply match according to what might be intended, because
then, if such a character is later added, everything fails.
> and I thought the whole point of loose matching is to follow the
> intent of the user even in the face of certain missing or extraneous
> punctuation and spacing characters, so even though it is not exactly
> what they asked for, it is close enough by the traditional definition.
> I realize that TR18 is not an official part of the standard, and that
> TR44 is now UAX44, so is. Therefore, this is a change in the
> standard that I don't believe was listed as a delta.
This archive was generated by hypermail 2.1.5 : Fri Mar 12 2010 - 17:28:51 CST