NFD normalisation test

From: spir (
Date: Sat Feb 06 2010 - 05:56:53 CST

  • Next message: spir: "Re: NFD normalisation test"


    I have a bunch of questions on the topic.

    The provided test data hold a huge list of specific and generic cases, of which about 11500 hangul ones.
    -1- Why so many? Is it necessary to test all of these? I guess for instance if a func correctly transforms 1, 2, 3 hangul LVT syllables, then it correctly transforms all of them, no?
    -2- Since hangul codes are normalized algorithmically (as opposed to a mapping), shouldn't they be in a separate part?
    -3- What are the specific cases (part 0), why are they apart?

    I also wonder about the source codes to be normalised.
    -4- Does each code / group of codes represent a whole, consistent, "user-perceived character"?
    -5- Would their concatenation build a valid character string (text)?
    -6- Should the (NFD) normalisation of this text result in the concatenation of individual normalized cases?

    My intention is to do the following; please tell me whether it makes sense:
    * Build separate test sets for specific / hangul / generic cases (done).
    * Select all specific, and N randomly-chosen hangul and generic cases.
    * From the complete data, run and check case-per-case normalisation, using the given assertions c3 == NFD(c1) == NFD(c2) == NFD(c3) and c5 == NFD(c4) == NFD(c5).
    * Using only source and NFD data columns, run and check complete text normalisation.


    la vita e estrany

    This archive was generated by hypermail 2.1.5 : Sat Feb 06 2010 - 06:02:54 CST