Re: How to remove accents while conforming to language standards?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 1 Nov 2013 22:26:04 +0100

Before dropping any accent, you should make sure that such drops will not
break (at least) the relative ordering at the primary collation strengh:
this gives very useful hints about German umlauts translformed to e, or Danish
rings transformed to double vowels, instead of being simply dropped.

2013/11/1 Jukka K. Korpela <jkorpela_at_cs.tut.fi>

> 2013-11-01 17:37, Jennifer Wong wrote:
>
> I would like to ask for advice on removing accents from characters.
>>
>
> To address first the question you ask in the Subject line, How to remove
> accents while conforming to language standards?, but do not ask in the
> message body, the answer is: You cant. Well, except in cases where
> language standards permit the omission. For example, according to modern
> French orthography standards, the circumflex in frache could and should
> be dropped (though it is still very common to keep it).
>
>
> While the normalization process is straight forward (NFD, remove
>> accents),
>>
>
> NFD does *not* remove accents. It is decomposition, not destruction. It
> decomposes, say, to a followed by a combining ring above. If you then
> have your own code removes the combining marks, thats a different issue,
> and generally a wrong thing to do.
>
>
> For example,
>> Danish, "" should be mapped to "aa", not "a".
>>
>
> Should as per which standard or policy? It is gene rally accepted for
> Danish to replace by aa if you cannot use . But what might be the
> situation, in the year 2013, where you really cannot use ?
>
>
> Likewise, in German, ""
>> "" "" should be mapped to "ae", "oe" and "ue" respectively, not "a",
>> "e", "u". Are there common practices on how to handle these special
>> cases?
>>
>
> There are various language-specific practices. They are not universal. For
> example, in Spanish texts, I dont think many people would find it
> acceptable to replace by ue, rather than just u, if some evil
> powers force you to stick to Ascii characters.
>
> Yucca
>
>
>
>
>
Received on Fri Nov 01 2013 - 16:28:47 CDT

This archive was generated by hypermail 2.2.0 : Fri Nov 01 2013 - 16:28:48 CDT