RE: Unicode->ASCII approximate conversion

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Dec 19 2003 - 06:12:19 EST

  • Next message: Marco Cimarosti: "RE: Unicode->ASCII approximate conversion"

    Hallvard B Furuseth wrote:
    > I need a function which converts Latin Unicode characters to the closest
    > equivalent ASCII characters, e.g. "é" -> "e".
    >
    > Before I reinvent the wheel, does any public domain or GPL code for this
    > already exist?

    Please don't use character names for that conversion:
    instead use the NFKD decompositions from the UCD, then see if the first
    character is an ASCII character, and if so, remove diacritics in the 03xx
    block (that have a "Mn" general category and a non-zero
    combining class). If there remains non ASCII characters use a default
    replacement like '?'. But you need some other custom rules:
    (look at sharp-s compatibility decomposition: it's best to
    map it to "ss" rather than "?", whch can be done by looking at
    casefoldings of "Ll" lowercase letters)

    This will be less tricky, as there's no guarantee that names will be
    consistent

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Fri Dec 19 2003 - 06:55:03 EST