Gary Roberts wrote:
> Ken and Kent bring up certain canonical equivalences, which the
> technique I proposed will not handle.
> I am now tempted to include mapping
> U+0340 -> U+0300
> U+232A -> U+3009
And the fullwidth ASCII should be mapped to "ordinary ASCII",
the halfwidth Katakana should be mapped to ordinary Katakana,
the presentation forms for Arabic mapped to their ordinary forms,
the Hangul syllables mapped to their Hangul Jamo strings,
compositions should be normalised, ...
I don't want to discourage you, but comparison of Unicode strings
is non-trivial, even when case sensitive.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT