Re: property, character, and sequence name loose matching

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Mar 12 2010 - 17:20:53 CST

  • Next message: Karl Pentzlin: "Århus mayor prefers Aarhus - "believing the ‘Å’ is a hindrance in international communication""

    On 3/11/2010 10:12 PM, karl williamson wrote:
    > Andrew West wrote:
    >> On 11 March 2010 20:32, karl williamson <public@khwilliamson.com> wrote:
    >>> I think it is actually better to do the following:
    >>> 1. Remove all white space
    >>> 2. Collapse multiple hyphens in a row into one
    >>> 3. Lowercase
    >>> 4. If the result is one of the three problematic ones, we are done.
    >>> 5. Remove all hyphens
    >>>
    >>> Then, if the strings are the same after the transforms, they match.
    >>
    >> No, then "TIBETAN MARK TSA PHRU" would match "TIBETAN MARK TSA -PHRU",
    >> which may be what the user intended, but it is not what they asked
    >> for, and would be as bad as matching e.g. "PERCENT IGN" and "PERCENT
    >> SIGN".
    >>
    >> Andrew
    >>
    >
    > OK, but that is a change from what TR18 says: "names should use a
    > loose match, disregarding case, spaces and hyphen" except for the
    > three problematic situations it mentions. There is no character
    > TIBETAN MARK TSA PHRU,
    But it's a name that could be added to the standard at any moment,
    because it would be formally distinct from any existing

    TIBETAN MARK TSA -PHRU

    so you can't simply match according to what might be intended, because
    then, if such a character is later added, everything fails.
    > and I thought the whole point of loose matching is to follow the
    > intent of the user even in the face of certain missing or extraneous
    > punctuation and spacing characters, so even though it is not exactly
    > what they asked for, it is close enough by the traditional definition.
    >
    > I realize that TR18 is not an official part of the standard, and that
    > TR44 is now UAX44, so is. Therefore, this is a change in the
    > standard that I don't believe was listed as a delta.
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Mar 12 2010 - 17:28:51 CST