Re: How to remove accents while conforming to language standards?

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Mon, 04 Nov 2013 21:52:39 +0200

2013-11-04 21:00, Jennifer Wong wrote:

> The use case is that customers want to integrate data from our
> enterprise solution to their ASCII-based downstream systems.

This is very different from the question about removing accents while
conforming to language standards. The very goal makes it impossible to
conform to language standards. The next question should be what the data
will be used for, and how.

> Thus all accents need to be removed.

I would not jump into that conclusion. Just because some system is
ASCII-based does not mean that you cannot in any way handle non-ASCII
data. You can encode non-ASCII characters in ASCII in many ways. To take
a trivial example, you could convert È to E` and later possibly convert
it back, though in such approaches you need to be careful to make the
conversion reversible (if it needs to be). In some cases, out-of-band
information could be included, e.g. entering a name in a simplified form
in ASCII but accompanied with a note (in ASCII) describing accents that
have been omitted.

Even if it is acceptable to do lossy mappings (like just dropping all
accents, or mapping, say, Ä to AE without worrying about possible AE in
original data), the crucial question is how the data will be used, now
and in the future.

Yucca
Received on Mon Nov 04 2013 - 13:54:32 CST

This archive was generated by hypermail 2.2.0 : Mon Nov 04 2013 - 13:54:33 CST