Re: Normalization in panlingual application

From: Asmus Freytag (
Date: Thu Sep 20 2007 - 14:46:02 CDT

  • Next message: Mike: "Re: Normalization in panlingual application"

    On 9/20/2007 10:55 AM, John D. Burger wrote:
    > Asmus Freytag wrote:
    >> IDN still operates on a restricted domain of characters, many
    >> characters that are part of ordinary text are disallowed from the
    >> get-go (I haven't checked where that subset is at recently, but
    >> that's the general idea). At the minimum, the transformations that
    >> are designed into IDN would need to be modified or extended to handle
    >> such characters. Because of that alone, the normalization and folding
    >> aspect of IDN is unlikely to be suitable for general text. There are
    >> likely additional issues.
    >> If you suggest that any scheme in which you can't represent the word
    >> "can't" is suitable for the class of applications that the original
    >> poster represents, then I fail to follow you.
    > But that's due to IDN's restricted domain, yes? I guess my thought
    > was that, if the transformations from IDN can be applied to a larger
    > domain of characters, then IDN might provide a fifth normalization
    > form appropriate for a broad class of applications.
    But because of the restricted domain the specification foldings for the
    larger domain is either undefined, or, possibly, if you simply extend an
    operation by analogy, not useful or counterproductive.
    > But my hope for a cookie-cutter solution appears to be forlorn. :)
    In processing of strings that carry meaning to the human reader there's
    rarely a cookie-cutter solution. ;-)


    This archive was generated by hypermail 2.1.5 : Thu Sep 20 2007 - 14:48:01 CDT