Re: Unicode->ASCII approximate conversion

From: Jungshik Shin (jshin@mailaps.org)
Date: Fri Dec 19 2003 - 08:10:31 EST

Next message: D. Starner: "RE: Unicode->ASCII approximate conversion"

Previous message: jon@hackcraft.net: "Re: Unicode->ASCII approximate conversion"
In reply to: jon@hackcraft.net: "Re: Unicode->ASCII approximate conversion"
Next in thread: D. Starner: "RE: Unicode->ASCII approximate conversion"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Fri, 19 Dec 2003 jon@hackcraft.net wrote:

> Quoting Hallvard B Furuseth <h.b.furuseth@usit.uio.no>:
>
> > I need a function which converts Latin Unicode characters to the closest
> > equivalent ASCII characters, e.g. "é" -> "e".

> 1. Produce the NFD normalisation of the text.
> 2. Remove all characters with a non-zero combining class.
> 3. Some non-ASCII characters may remain (particularly those from non-Latin
> scripts) handling of some can be done nicely, but some may require you to
> raise an exception or output a replacement character.

> on your application. Specialised handling of some characters is possible, for
> instance you could convert the trademark sign to "(TM)" to avoid confusion,

For Korean syllables (U+AC00 - U+Dxxx), you can use 'Hangul Syllable
Short Names' that can be algorithmically derived with small tables.

Next message: D. Starner: "RE: Unicode->ASCII approximate conversion"
Previous message: jon@hackcraft.net: "Re: Unicode->ASCII approximate conversion"
In reply to: jon@hackcraft.net: "Re: Unicode->ASCII approximate conversion"
Next in thread: D. Starner: "RE: Unicode->ASCII approximate conversion"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Dec 19 2003 - 09:00:12 EST