Re: [unicode] UTF-c

From: Bjoern Hoehrmann (derhoermi@gmx.net)
Date: Mon Feb 21 2011 - 15:49:23 CST

Next message: Koji Ishii: "Titlecasing words starting with numeric glyphs and period as word separator"

Previous message: Doug Ewell: "RE: [unicode] UTF-c"
In reply to: Philippe Verdy: "Re: [unicode] UTF-c"
Next in thread: William_J_G Overington: "Re: [unicode] UTF-c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

* Philippe Verdy wrote:
>And anyway it is also much simpler to understand and easier to
>implement correctly (not like the sample code given here) than SCSU,
>and it is still very highly compressible with standard compression
>algorithms while still allowing very fast processing in memory in its
>decompressed encoded form :
>- a bit faster than UTF-8, as seen in my early benchmarks, for small
>number of large texts such as pages in a Wiki database,
>- but a bit slower for large number of small strings such as tabular
>data, because of the higher number of conditional branches when using
>a CPU with a 1-way instruction pipeline (not a problem with today's
>processors that include a dozen of parallel pipelines even in a single
>core, if the compiled assembly code is correctly optimized and
>scheduled to make use of them when branch-prediction cannot help
>much).

It seems to me from a very very brief look that you can eliminate much
of the conditional logic there in the same manner in which I removed it
in http://bjoern.hoehrmann.de/utf-8/decoder/dfa/ from the UTF-8 decoder
as far as decoding goes (there you could completely eliminate branches,
but it would cost you a register, among other things, as I recall). The
main performance problem I encountered when developing the decoder was
actually compilers being silly...

-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Next message: Koji Ishii: "Titlecasing words starting with numeric glyphs and period as word separator"
Previous message: Doug Ewell: "RE: [unicode] UTF-c"
In reply to: Philippe Verdy: "Re: [unicode] UTF-c"
Next in thread: William_J_G Overington: "Re: [unicode] UTF-c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Feb 21 2011 - 15:51:06 CST