From: David Starner (firstname.lastname@example.org)
Date: Mon Jul 07 2008 - 19:33:06 CDT
On Mon, Jul 7, 2008 at 7:17 PM, William J Poser <email@example.com> wrote:
> Of course you want to be prepared for any possible input, but
> in some cases you do know what the range of possible inputs is.
> The input may not be coming directly from the user. It may be user
> input that has already been cleaned or validated, or it may be
> data that you yourself have generated.
Most low-level string processing code shouldn't need to be rewritten
for each application. If you've got UCS-2 only code, you have to
reëvaluate it for each project, or introduce a subtle bug by the reuse
of code. If you don't reuse code, you're probably rewriting code,
which introduces bugs, especially in the parts that aren't
well-tested--which for most people will include non-BMP characters.
And just because you can clean and validate user input doesn't mean
that you should arbitrarily forbid non-BMP characters. One of the
principles of Unicode is that you can pass through arbitrary scripts
and not worry about the difference.
> I don't get the point. Whether you're dealing with one character or
> many, life is simpler if they're all the same size.
If I have to look up a single character in an array, it makes a
difference. If I'm looking up multiple characters, it no longer
matters the length of any one of them; you're passing and returning
> But for some purposes, yes, you can assume that input is BMP-only.
> Not all input comes direct from the user.
Even for the times that you can assume integer input is positive, you
generally need to guard that code carefully with run-time tests. I
would regard nothing less as reasonable and necessary for code that
assumes the input in in the BMP. If simplicity is your goal, why not
This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 19:36:19 CDT