Unicode denormalizer

From: Bjoern Hoehrmann (derhoermi@gmx.net)
Date: Mon Oct 04 2010 - 22:59:19 CDT

  • Next message: fantasai: "[css3-text] W3C CSS3 Text Working Draft Published"


      Every now and then I need a tool that takes a Unicode string and gives
    me all the strings that are not identical but equivalent under one of
    the four normalization forms defined in UAX #15. Now I do have a couple
    of hacks that get me by, but is there any tool or paper that has a more
    complete solution? Last year I worked a bit in the general direction,
    but http://lists.w3.org/Archives/Public/www-archive/2009Feb/0071.html I
    ran out of time after proving that the sets of strings in one of the
    normal forms are all regular languages, and writing a denormalizer was
    not the goal anyway.


    Bjrn Hhrmann  mailto:bjoern@hoehrmann.de  http://bjoern.hoehrmann.de
    Am Badedeich 7  Telefon: +49(0)160/4415681  http://www.bjoernsworld.de
    25899 Dagebll  PGP Pub. KeyID: 0xA4357E78  http://www.websitedev.de/ 

    This archive was generated by hypermail 2.1.5 : Mon Oct 04 2010 - 18:02:17 CDT