Re: Unicode denormalizer

From: Mark Davis ☕ (mark@macchiato.com)
Date: Wed Oct 06 2010 - 10:49:14 CDT

  • Next message: Doug Ewell: "Re: Recommendations for Unicode auto-detection"

    ICU has a canonical iterator, one that provides all the strings that produce
    the same result under toNFC(...).

    Mark

    *— Il meglio è l’inimico del bene —*

    On Mon, Oct 4, 2010 at 20:59, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:

    > Hi,
    >
    > Every now and then I need a tool that takes a Unicode string and gives
    > me all the strings that are not identical but equivalent under one of
    > the four normalization forms defined in UAX #15. Now I do have a couple
    > of hacks that get me by, but is there any tool or paper that has a more
    > complete solution? Last year I worked a bit in the general direction,
    > but http://lists.w3.org/Archives/Public/www-archive/2009Feb/0071.html I
    > ran out of time after proving that the sets of strings in one of the
    > normal forms are all regular languages, and writing a denormalizer was
    > not the goal anyway.
    >
    > Thanks,
    > --
    > Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
    > Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
    > 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Oct 06 2010 - 10:53:45 CDT