Re: How to remove accents while conforming to language standards?

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Mon, 4 Nov 2013 20:19:16 +0000

On Mon, 4 Nov 2013 19:00:17 +0000
Jennifer Wong <jennifer.wong_at_workday.com> wrote:

> Thank you everyone for your input.
>
> The use case is that customers want to integrate data from our
> enterprise solution to their ASCII-based downstream systems. Thus all
> accents need to be removed.

Have you confirmed that they are using ASCII rather than say, Latin-1?
Some people call Latin-1 ASCII!

> Ilay's "Transliteration on Passport" doc is very useful. We can use
> that as a basis to map special transliteration cases before
> normalizing and removing accents.

Have you checked how they are currently handling accents? Do you need
to be even more brutal in places and strip out apostrophes? An
O'Sullivan at my place of work had to accept the mangling of his
surname to Osullivan!

How are you constraining the input repertoire? Stripping diacritics
won't deal with U+0131 LATIN SMALL LETTER DOTLESS I, and would make a
mess of the usually incorrect <U+0131, U+0307 COMBINING DOT ABOVE>.

Richard.
Received on Mon Nov 04 2013 - 14:21:09 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 04 2013 - 14:21:10 CST