Re: Re: Additional normalization test cases

From: Philippe VERDY (
Date: Sun Mar 27 2005 - 15:20:20 CST

  • Next message: Philippe VERDY: "Re: Re: Security Issues:  Navajo"

    I also think that adding a few test strings for the case where NFD or NFKD or NFC or NFKC causes the expansion and/or reduction of the source string length, so that the characters need to be replaced at distinct positions between the source and the result should be OK.

    Because this will reveal bugs in the normalizer, these specific tests should be performed early before the rest of the tests, so they are good candidates for the inclusion in @part0 of the NormalizationTest.txt file in the UCD...

    Also I can't remember if there has been inclusions of tests for the algorithmic Hangul (de/re)compositions with combining diacritics, since the change of wording in the Normalization Standard Annexe. I suppose they are present because they have been discussed heavily. these specific tests should then be present in @Part0 as well, because Hangul composition is part of the algorithm implementation, rather of the UCD data, and adds a bit to the complexit for implementing NF(K)[CD] and those cass should be validated early before testing the whole list of (de/re)composable characters.

    This @Part0 of the test file canot be derived from the UCD alone, so I suppose that the test file is generated from a small template containing specific tests (most of the test file, i.e. @Part1, can be generated automatically from other files).

    (In my first implementation of the normalization algorithm, I did similar errors caused by the confusion between the source and destination offsets, or in the case where the NFC normalization caused expansion of the string length, so that the conversion should not have occured "in place", but should have required a separate buffer; as I use a "fast" normalization test, I have not always allocated the destination buffer but reused the same buffer if possible for performance reasons, and an unsuspected bug caused the resulting string to be garbled, or to contain repetitions of the decomposed characters excluded from recomposition).

    In his case, his bug was revelaed because of expansions. But reductions can produce similar bugs too...

    -- Philippe.

    "Michael (michka) Kaplan" wrote:
    > Ok, I guess a test that has at least two characters no matter what the form
    > was (so that the target did not always start at the same 0th index as the
    > source would help here.
    > Though since that is more of a code bug than a conformance to proper
    > transformation bug, it is not really what the tests are about. If the author
    > preferred to instead clarify the purpose of the tests I would not find their
    > position to be unsupportable....

    This archive was generated by hypermail 2.1.5 : Sun Mar 27 2005 - 15:21:09 CST