Re: Unicode->ASCII approximate conversion

From: Hallvard B Furuseth (h.b.furuseth@usit.uio.no)
Date: Fri Dec 19 2003 - 10:29:23 EST

  • Next message: Doug Ewell: "Re: [OT] Keyboards (was: American English translation of character names)"

    D. Starner writes:
    >> The result is much better if you allow the ASCII conversion to be a string.
    >> This allows you to, e.g., "©" = "(c)", "½" = "1/2", and so on. This is also
    >> good for letters: "ß" = "ss", "å" = "aa", etc.
    >
    > etcetra? I think he needs more direction then that, especially most naïve
    > algorithms are going to produce "a" from "å". Diagraphs can be treated
    > as titlecase or capital or intelligently.

    Hm. Actually I'll want a mode which generates "a" rather than "aa" for
    that one, to mimic local practice for how to generate e-mail adresses.
    Though that can be tacked on with an extra hack afterwards.

    One question, unless it has been answered already - I need to read up on
    Unicode before I'll understand all the answers:

    I'd like to translate 'ø' to 'o' or maybe 'oe'. 'o' at least when used
    for matching, since it should match Swedish 'ö'. However,
    UnicodeData.txt has no decomposition property for that character:

    00F8;LATIN SMALL LETTER O WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER O SLASH;;00D8;;00D8

    Is there some other property I can use? Or is this a rare special case
    to handle by hand?

    -- 
    Hallvard
    


    This archive was generated by hypermail 2.1.5 : Fri Dec 19 2003 - 12:13:11 EST