Re: Case folding

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Fri Jun 09 2006 - 10:04:19 CDT

  • Next message: Richard Wordingham: "Re: UTF-8 can be used for more than it is given credit"

    Philippe Verdy wrote on Friday, June 09, 2006 at 7:34 AM

    > From: "Mike" <mike-list@pobox.com>
    >>> To answer this question, ask yourself what would happen if you
    >>> uppercased the string "Straße" this way.

    >> I think I would get the right answer, "STRASSE" (if
    >> that is the "sharp S" I have learned about a few weeks
    >> back).

    > Wrong. The case folding of the sharp s is a sharp s. The standard case
    > folding does not convert any letter to uppercase.

    Is there a 'standard' case folding? There are two default case-foldings,
    the simple case-folding, and the full case-folding! The simple case-folding
    is as you state - the full case folding is to 'ss'. This results from the
    full upper-casing being 'SS', so Mike's answer is correct. Or are you
    saying that 'ffrench' should not match case-fold to the same as 'Ffrench'?
    (Incidentally, how should we handle the locale specific titlecasing here?
    It's a bit more local than simply 'en'!)

    > Note that if you compare case insensitively and don't care about other
    > variations (at secondary collation level or higher), you can reduce a lot
    > the complexity of the algorithm and get much faster result using the
    > following:
    >
    > toLowerCase( toUpperCase(filter(NFKD( string ))) )
    >
    > where the filter() function eliminates all combinining characters with
    > combining class greater than zero.

    It's a shame it filters out all the Tibetan vowels.

    Richard.



    This archive was generated by hypermail 2.1.5 : Fri Jun 09 2006 - 10:13:01 CDT