Re: UTF-8 can be used for more than it is given credit ( Re: UTF-7 - is it dead? )

From: Hans Aberg ([email protected])
Date: Mon Jun 05 2006 - 05:05:57 CDT

Next message: Erkki Kolehmainen: "Re: are Unicode codes somehow specified in official national linguistic literature ? (worldwide)"

Previous message: Philippe Verdy: "Re: UTF-8 can be used for more than it is given credit ( Re: UTF-7 - is it dead? )"
In reply to: Philippe Verdy: "Re: UTF-8 can be used for more than it is given credit ( Re: UTF-7 - is it dead? )"
Next in thread: saqqara: "Re: UTF-7 - is it dead?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 5 Jun 2006, at 11:33, Philippe Verdy wrote:

> I don't presume which encoding is better for all.

This is what I too said.

> Application and networking tuning (andinteroperability) is as much
> important! UTF-8 is excellent for interoperability in heterogeneous
> environment, and is supported by the most important number of
> protocols. UTF-32 is not, and it wastes space except for local
> handling of small quantities of texts, which is otherwise
> represented and stored or transmitted differently, only because
> it's more convenient for interoperation.

This is also what I said: UTF-32 may be favored internally in a
program for the sake of alignment and speed. UTF-8 is fine for text-
to-ext communications.

> But don't forget databases. They are stored on disks, and disk
> accessb is always too slow. what you read from disk will end into
> memory and will swap to disk. If you can't handle the strict
> natively in memory exactly the way it is stored, the swapping to
> disk will require more disk space.

There I said that if data compression is a major objective, do not
rely on a character encoding to do the job, but seek out more
efficient compression methods.

> From: "Hans Aberg" <[email protected]>
>> And here Moore's law comes into play again, as RAM becomes
>> increasingly cheap.
>
> Moore's law has nothing to do here. Even though RAM is getting
> lower per megabyte, the modern programs use more memory and handle
> more data.

My focus was the issue, where I wanted to find out why the OP felt
"cache misses" excluded UTF-32 in favor of UTF-8. There, I think,
this is a problem only if you have too little RAM in your computer,
which the Moore's law say that soon enough will be available. If you
have too little RAM on a virtual memory based computer, really
nothing will help any of your program running, but to get enough with
RAM, as the faster parts of the computer will spend time waiting for
page swaps to occur.

If you have enough with RAM, UTF-32 should be faster than UTF-8
internally in a program, as no alignemnets need to be computed. But
only proper profiling for each given program can really tell.

Hans Aberg

Next message: Erkki Kolehmainen: "Re: are Unicode codes somehow specified in official national linguistic literature ? (worldwide)"
Previous message: Philippe Verdy: "Re: UTF-8 can be used for more than it is given credit ( Re: UTF-7 - is it dead? )"
In reply to: Philippe Verdy: "Re: UTF-8 can be used for more than it is given credit ( Re: UTF-7 - is it dead? )"
Next in thread: saqqara: "Re: UTF-7 - is it dead?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 05 2006 - 05:41:37 CDT