Re: FW:transform a (UNICODE) accented character to its equivalent (UNICODE) non-accented character

From: Anto'nio Martins-Tuva'lkin (antonio@tuvalkin.web.pt)
Date: Wed Aug 13 2003 - 21:42:27 EDT

  • Next message: Anto'nio Martins-Tuva'lkin: "Re: Handwritten EURO sign (off topic?)"

    On 2003.08.06, 11:37, Philippe Verdy <verdy_p@wanadoo.fr> wrote:

    > The main UCD table already contains the needed NFD canonical
    > decompositions, and removing accents is simply a matter of NFD
    > decomposition plus removal of combining characters
    <...>
    > they are not really accents but are important to correctly identify
    > vowels and consonnants,

    Note that even most latin script orthographies will suffer badly if
    diacriticals are removed. I'm sure we can all come out with examples,
    many of which quite embarrassing or even dangerous. (F.i., portuguese
    «Do you have a porpoise?» becomes quite nasty if you remove the one
    acute from it...) Learning that diacriticals do, in most languages, a
    lot more than just add snazziness to a word is probably lesson #1 in
    i-n-t-e-r-n-a-t-i-o-n-a-l-i-z-a-t-i-o-n...

    -- ____.
    António MARTINS-Tuválkin | ()|
    <antonio@tuvalkin.web.pt> |####|
    R. Laureano de Oliveira, 64 r/c esq. |
    PT-1885-050 MOSCAVIDE (LRS) Não me invejo de quem tem |
    +351 934 821 700 carros, parelhas e montes |
    http://www.tuvalkin.web.pt/bandeira/ só me invejo de quem bebe |
    http://pagina.de/bandeiras/ a água em todas as fontes |



    This archive was generated by hypermail 2.1.5 : Wed Aug 13 2003 - 22:26:34 EDT