Re: Additional normalization test cases

From: Elliotte Rusty Harold (elharo@metalab.unc.edu)
Date: Sun Mar 27 2005 - 08:36:02 CST

Next message: Jony Rosenne: "U+0023"

Previous message: Jukka K. Korpela: "Symbols for chemical bonds"
In reply to: Michael \(michka\) Kaplan: "Re: Additional normalization test cases"
Next in thread: Michael \(michka\) Kaplan: "Re: Additional normalization test cases"
Reply: Michael \(michka\) Kaplan: "Re: Additional normalization test cases"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Michael (michka) Kaplan wrote:

> What was the bug you fixed? It might be easier to figure out a good way to
> add test cases by knowing what the implementation was doing....
>

The specific bug involved an off-by-one error indexing into the string
in the composition step. Specifically, I was assuming the index into the
source string could also serve as an index into the result string, and
that turned out not to be the case. The result was that it recomposed
into ĄL. I could successfully normalize the single characters Ľ or Ą. It
was the combination of the two in succession that triggered the bug.

What was really surprising was that I could pass all the Unicode
supplied tests and still miss this. This may be because of the naivete
of my algorithm, which is much less sophisticated than the standard
Unicode implementation. In particular, it does not (yet) notice that the
string ĄĽ is already in NFC which many implementations might do, and
thus never check something like this in the first place.

-- 
Elliotte Rusty Harold  elharo@metalab.unc.edu
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim

Next message: Jony Rosenne: "U+0023"
Previous message: Jukka K. Korpela: "Symbols for chemical bonds"
In reply to: Michael \(michka\) Kaplan: "Re: Additional normalization test cases"
Next in thread: Michael \(michka\) Kaplan: "Re: Additional normalization test cases"
Reply: Michael \(michka\) Kaplan: "Re: Additional normalization test cases"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Mar 27 2005 - 08:36:52 CST