From: Sam Mason (email@example.com)
Date: Mon Apr 27 2009 - 09:51:07 CDT
On Mon, Apr 27, 2009 at 04:28:00PM +0200, Bjoern Hoehrmann wrote:
> * Sam Mason wrote:
> >On Sun, Apr 26, 2009 at 10:57:49AM -0700, Mark Davis wrote:
> >> I'd disagree about that. It is certainly simpler to always process as code
> >> points, but where performance is required, you need to design your
> >> algorithms with the encoding of the core string representation in mind,
> >> typically UTF-8 or UTF-16. You can get huge speedups in that way.
> >Are there any pointers to literature about that? I'd be interested
> >to see how this sort of scheme would hang together; there would seem
> >to be quite a trade-off between instruction cache pressure, branch
> >prediction and most probably other effects I can't think of at the
> To offer one data point, Python's UTF-8 decoder does this and I've timed
> it against a couple of others. I note that the decoder accepts surrogate
> code points so the comparison is not entirely fair.
The point I was responding to was about doing regex's directly on the
data (i.e. UTF-8 when the source is UTF-8, UTF-16 when the source is
UTF-16, etc.) and not attempting to convert to a standard internal
> The times are at:
> Transcoding my name from UTF-8 to UTF-16 is substantially faster due to
> this optimization. That is largely because my processor can predict the
> additional branches almost perfectly (16 byte string, the pattern for
> the check is simply alternating no yes no yes). For the other data sets
> Python's decoder is slower than my own without any such optimization.
> I've also added a similar optimization to my own version, using the
> aligned 4 byte read and aligned 16 byte reads using SSE2 instructions.
> The results were similar (that is, for the more complex data sets,
> perfomance got worse).
Have you seen work such as u8u16? It optimises code specifically
for newer instruction sets (MMX, SSE and AltiVec) resulting in higher
performance (about five times faster than your code, and ~20 times
faster than iconv, if I'm reading things correctly). They've patented
it for commercial use and left it free for FOSS use which is OK, but a
It would be nice if their Parallel Bit Streams paper went into more
details, I've only given it a quick scan and didn't see the sort of
details that interested me anyway!
-- Sam http://samason.me.uk/  http://u8u16.costar.sfu.ca/  http://u8u16.costar.sfu.ca/attachment/wiki/WikiStart/ppopp074-cameron.pdf
This archive was generated by hypermail 2.1.5 : Mon Apr 27 2009 - 09:53:58 CDT