Re: How to remove accents while conforming to language standards?

From: Markus Scherer <>
Date: Mon, 4 Nov 2013 10:54:55 -0800

Hi Jennifer,

On Fri, Nov 1, 2013 at 8:37 AM, Jennifer Wong <>wrote:

> I would like to ask for advice on removing accents from characters.
> While the normalization process is straight forward (NFD, remove accents),
> it does not take into account of special cases. For example, Danish, "å"
> should be mapped to "aa", not "a". Likewise, in German, "ä" "ö" "ü" should
> be mapped to "ae", "oe" and "ue" respectively, not "a", "e", "u". Are
> there common practices on how to handle these special cases? Thank you.

Can you describe what your use case is?

One possible area that appears not to have been discussed yet is sorting of
strings and full-text search (as in ctrl-F in a browser or word processor).
If you are after those, then please look for "unicode collation" and "cldr
collation". The ICU libraries
<>might also help.

Best regards,
Received on Mon Nov 04 2013 - 12:58:10 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 04 2013 - 12:58:13 CST