RE: What does it mean to "not be a valid string in Unicode"?

From: Whistler, Ken <ken.whistler_at_sap.com>
Date: Tue, 8 Jan 2013 18:28:35 +0000

> Sorry, but I have to disagree here. If a list of strings contains items
> with lone surrogates (garbage), then sorting them doesn't make the
> garbage go away, even if the items may be sorted in "correct" order
> according to some criterion.

Well, yeah, I wasn't claiming that the principled, "correct" output made the garbage go away.

Let me put it this way: if my choices are 1) garbage in, garbage reliably sorted out into garbage bin, versus 2) garbage in, sorting fails with exception, then I'll pick #1. ;-)

To give a concrete example, my implementation of UCA reliably passes the SHIFTED test cases in the conformance test, even though those test cases (deliberately) contain some ill-formed strings. If I instead did validation testing on input strings in my base implementation, it would be slower, *and* to pass the conformance test I would have to add a separate preprocessing stage that probed all the input data for ill-formed strings and filtered those cases out before engaging the test, so that it wouldn't fail with an exception when it hit the bad data.

--Ken
Received on Tue Jan 08 2013 - 12:32:54 CST

This archive was generated by hypermail 2.2.0 : Tue Jan 08 2013 - 12:32:56 CST