Re: Additional normalization test cases

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Sun Mar 27 2005 - 12:51:41 CST

Next message: Philippe VERDY: "Re: Re: Security Issues: �Navajo"

Previous message: Jukka K. Korpela: "Re: U+0023"
In reply to: Elliotte Rusty Harold: "Re: Additional normalization test cases"
Next in thread: Mark Davis: "Re: Additional normalization test cases"
Maybe reply: Philippe VERDY: "Re: Re: Additional normalization test cases"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ok, I guess a test that has at least two characters no matter what the form
was (so that the target did not always start at the same 0th index as the
source would help here.

Though since that is more of a code bug than a conformance to proper
transformation bug, it is not really what the tests are about. If the author
preferred to instead clarify the purpose of the tests I would not find their
position to be unsupportable....

MichKa

----- Original Message -----
From: "Elliotte Rusty Harold" <elharo@metalab.unc.edu>
To: "Michael (michka) Kaplan" <michka@trigeminal.com>
Cc: <unicode@unicode.org>
Sent: Sunday, March 27, 2005 6:36 AM
Subject: Re: Additional normalization test cases

> Michael (michka) Kaplan wrote:
>
> > What was the bug you fixed? It might be easier to figure out a good way
to
> > add test cases by knowing what the implementation was doing....
> >
>
> The specific bug involved an off-by-one error indexing into the string
> in the composition step. Specifically, I was assuming the index into the
> source string could also serve as an index into the result string, and
> that turned out not to be the case. The result was that it recomposed
> into ĄL. I could successfully normalize the single characters Ľ or Ą. It
> was the combination of the two in succession that triggered the bug.
>
> What was really surprising was that I could pass all the Unicode
> supplied tests and still miss this. This may be because of the naivete
> of my algorithm, which is much less sophisticated than the standard
> Unicode implementation. In particular, it does not (yet) notice that the
> string ĄĽ is already in NFC which many implementations might do, and
> thus never check something like this in the first place.
>
> --
> Elliotte Rusty Harold elharo@metalab.unc.edu
> XML in a Nutshell 3rd Edition Just Published!
> http://www.cafeconleche.org/books/xian3/
> http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>
>

Next message: Philippe VERDY: "Re: Re: Security Issues: �Navajo"
Previous message: Jukka K. Korpela: "Re: U+0023"
In reply to: Elliotte Rusty Harold: "Re: Additional normalization test cases"
Next in thread: Mark Davis: "Re: Additional normalization test cases"
Maybe reply: Philippe VERDY: "Re: Re: Additional normalization test cases"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Mar 27 2005 - 12:51:10 CST