Re: Unicode->ASCII approximate conversion

From: Hallvard B Furuseth (h.b.furuseth@usit.uio.no)
Date: Fri Dec 19 2003 - 10:29:23 EST

Next message: Doug Ewell: "Re: [OT] Keyboards (was: American English translation of character names)"

Previous message: Doug Ewell: "Re: [OT] Keyboards (was: American English translation of character names)"
In reply to: D. Starner: "RE: Unicode->ASCII approximate conversion"
Next in thread: Radovan Garabik: "Re: Unicode->ASCII approximate conversion"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

D. Starner writes:
>> The result is much better if you allow the ASCII conversion to be a string.
>> This allows you to, e.g., "©" = "(c)", "½" = "1/2", and so on. This is also
>> good for letters: "ß" = "ss", "å" = "aa", etc.
>
> etcetra? I think he needs more direction then that, especially most naïve
> algorithms are going to produce "a" from "å". Diagraphs can be treated
> as titlecase or capital or intelligently.

Hm. Actually I'll want a mode which generates "a" rather than "aa" for
that one, to mimic local practice for how to generate e-mail adresses.
Though that can be tacked on with an extra hack afterwards.

One question, unless it has been answered already - I need to read up on
Unicode before I'll understand all the answers:

I'd like to translate 'ø' to 'o' or maybe 'oe'. 'o' at least when used
for matching, since it should match Swedish 'ö'. However,
UnicodeData.txt has no decomposition property for that character:

00F8;LATIN SMALL LETTER O WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER O SLASH;;00D8;;00D8

Is there some other property I can use? Or is this a rare special case
to handle by hand?

-- 
Hallvard

Next message: Doug Ewell: "Re: [OT] Keyboards (was: American English translation of character names)"
Previous message: Doug Ewell: "Re: [OT] Keyboards (was: American English translation of character names)"
In reply to: D. Starner: "RE: Unicode->ASCII approximate conversion"
Next in thread: Radovan Garabik: "Re: Unicode->ASCII approximate conversion"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Dec 19 2003 - 12:13:11 EST