RE: Normalisation stability, was: Compression through normalization

From: Philippe Verdy (
Date: Tue Nov 25 2003 - 13:21:44 EST

  • Next message: Michael Everson: "RE: How can I have OTF for MacOS"

    John Cowan writes:
    > Since it adds efficiency to normalize only once,
    > it is worthwhile to define a few normalization forms and urge
    > people to produce text in one of them, so that receivers need not
    > normalize but need only check for normalization, typically much cheaper.

    I'm not convinced that there's a significant improvement when only checking
    for noramlization but not perfomring it. It requires at least a list of the
    characters are acceptable in a normalization form, and as well their
    combining classes.

    This data, which still requires a table to perform the check, is not much
    smaller than the data with needed decompositions. And as well, if one can
    perform a normalization check and detect that combining characters can be
    reordered, it's not a bug performance hit to reorder them, even if we must
    decompose them first. In any cases, you still need to perform lookup of
    characters in a table of character properties.

    The real performance gain comes when applications do not even need to
    perform this check, as all strings are marked by their currently supported
    or not supported normalizatrion forms.

    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE!

    This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 14:08:30 EST