Re: UCS-4, UCS-2, UTF-16, UTF-8

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Fri Feb 18 2000 - 08:58:14 EST


"G. Adam Stanislav" wrote on 2000-02-18 12:31 UTC:
> Because such conversions take time, no matter how short,

No, it does not. The Intel BSWAP command is executed in *zero* time on
all Pentiums. It sets just a flag for the ALU and then drops out of the
integer pipeline. Try it before you claim otherwise.

The test for the BOM however not only costs time (no matter how short
;-), it also requires access to the beginning of the file (which can
cost a LOT of time, since you can execute >100000 CPU instructions in
the time needed to load a sector from the disk), and it adds to the
state space that you have to cover during tests. All these factors by
far outweight any naive and unrealistic concerns about non-measureable
performance losses caused by byte swapping.

Even if you don't use the Pentium BSWAP instr, Intel's processors are
extremely efficient in non-aligned memory accesses, such that even the
naive implementation of the byte swap performs *remarkably* well.

Naive unoptimized byte swapping for a 5 MB Word document at load/save
time would cost you significantly less than 80 milliseconds.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT