Re: How to remove accents while conforming to language standards?

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Mon, 4 Nov 2013 10:54:55 -0800

Hi Jennifer,

On Fri, Nov 1, 2013 at 8:37 AM, Jennifer Wong <jennifer.wong_at_workday.com>wrote:

> I would like to ask for advice on removing accents from characters.
> While the normalization process is straight forward (NFD, remove accents),
> it does not take into account of special cases. For example, Danish, "å"
> should be mapped to "aa", not "a". Likewise, in German, "ä" "ö" "ü" should
> be mapped to "ae", "oe" and "ue" respectively, not "a", "e", "u". Are
> there common practices on how to handle these special cases? Thank you.
>

Can you describe what your use case is?

One possible area that appears not to have been discussed yet is sorting of
strings and full-text search (as in ctrl-F in a browser or word processor).
If you are after those, then please look for "unicode collation" and "cldr
collation". The ICU libraries
<http://userguide.icu-project.org/collation>might also help.

Best regards,
markus
Received on Mon Nov 04 2013 - 12:58:10 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 04 2013 - 12:58:13 CST