> Let me see if I have this right. You are saying:
> 1. UTF-8 is fastest because no byte swapping is necessary.
Not exactly; I'm saying UTF-8 is faster because it (almost always in the
*aggregate*) means fewer bits transferred, and that is the bottleneck; it's
easier because you don't have to even think about byte order.
> 2. Normalization will be needed no matter the encoded form.
no, I said whether or not normalization is needed is *independent * of the
byte order or concrete bit coding. If you're smart you may be able to
search "unnormalized" data in a lot of cases.
> 3. Changing text to some other form is no problem once in memory.
not "no problem", but substantially less that the I/O bandwidth. If you're
going to search 5 BGytes of data, *and* you have a decent search algorithm,
your CPU will be mostly twiddling its thumbs waiting for the next disk
transfer to complete. If you can get the I/O subsystem to transfer at media
speed, you're there. You can use the other cycles to do a cute 3-D spinning
>Mark Leisher "A designer knows he has achieved perfection
> Computing Research Lab not when there is nothing left to add, but
> New Mexico State University when there is nothing left to take away."
> Box 30001, Dept. 3CRL -- Antoine de Saint-Exupéry
I'm amused every time I see my translation of this quote appear. Nothing
quite like an RFC for being considered authoritative beyond its means ...
it's a "Request For Comments", not a gospel. Not that it is a bad
translation, but he didn't make a noun reference to either design or
designers that I recall. Better would be: "Perfection is achieved, not when
there is nothing left to add, but when there is nothing to take away."
(references: RFC1726, 1925, various I-Ds, now expired, the Connexions
journal, and the current specification of Avian/IPv7,
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT