RE: What does it mean to "not be a valid string in Unicode"?

From: Whistler, Ken <>
Date: Tue, 8 Jan 2013 18:28:35 +0000

> Sorry, but I have to disagree here. If a list of strings contains items
> with lone surrogates (garbage), then sorting them doesn't make the
> garbage go away, even if the items may be sorted in "correct" order
> according to some criterion.

Well, yeah, I wasn't claiming that the principled, "correct" output made the garbage go away.

Let me put it this way: if my choices are 1) garbage in, garbage reliably sorted out into garbage bin, versus 2) garbage in, sorting fails with exception, then I'll pick #1. ;-)

To give a concrete example, my implementation of UCA reliably passes the SHIFTED test cases in the conformance test, even though those test cases (deliberately) contain some ill-formed strings. If I instead did validation testing on input strings in my base implementation, it would be slower, *and* to pass the conformance test I would have to add a separate preprocessing stage that probed all the input data for ill-formed strings and filtered those cases out before engaging the test, so that it wouldn't fail with an exception when it hit the bad data.

Received on Tue Jan 08 2013 - 12:32:54 CST

This archive was generated by hypermail 2.2.0 : Tue Jan 08 2013 - 12:32:56 CST