RE: Unicode->ASCII approximate conversion

From: D. Starner (shalesller@writeme.com)
Date: Fri Dec 19 2003 - 08:21:48 EST

  • Next message: Mark E. Shoulson: "Re: [OT] Keyboards (was: American English translation of character names)"

    > The result is much better if you allow the ASCII conversion to be a string.
    > This allows you to, e.g., "©" = "(c)", "½" = "1/2", and so on. This is also
    > good for letters: "ß" = "ss", "å" = "aa", etc.

    etcetra? I think he needs more direction then that, especially most naïve
    algorithms are going to produce "a" from "å". Diagraphs can be treated
    as titlecase or capital or intelligently.

    00FE - "th"
    00DE - "TH"
    00F0 - "dh" ("th"?)
    OOD0 - "DH" ("TH"?)
    0108 - "CH" (Esperanto)
    0109 - "ch"
    011C, 011D - "GH", "gh" (E-o)
    0124, 0125 - "HH", "hh" (")
    0134, 0135 - "JH", "jh" (")
    015C, 015D - "SH", "sh" (")
    017F - "s"

    Depending on your goals, 015F & 0161 could be "sh", 0163 "ts",
    017D "zh", etc.

    0195 - "hw"
    01A3 - "gh"(?)
    01BF - "w"
    01C0 - "|" ("c"?)
    01C1 - "||"? ("x"?)
    01C3 - "!" ("q"?)
    0223 - "w" ("ou"? "8"?)

    I omitted most capitals and those that can be found by decomposition
    or name stripping, as well a bunch I don't know anything about.

    -- 
    ___________________________________________________________
    Sign-up for Ads Free at Mail.com
    http://promo.mail.com/adsfreejump.htm
    


    This archive was generated by hypermail 2.1.5 : Fri Dec 19 2003 - 09:01:18 EST