RE: UTF-8 vs UTF-16 as processing code

From: Jones, Bob (
Date: Fri Jun 16 2000 - 15:02:09 EDT

I have the same question. And, if you do go UTF-8 for processing, how does
that work with Windows NT/2000? Is it even possible to have input come in
as UTF-8? If you compile with Unicode turned on, it seems to automatically
be UCS-2.


-----Original Message-----
From: []
Sent: Friday, June 16, 2000 11:26 AM
To: Unicode List
Subject: UTF-8 vs UTF-16 as processing code

Hi everybody,

I'm wondering if there are any analyses comparing UTF-8 with UTF-16 for
use as a processing code. UCS-2 has often been considered a good
representation to use internally inside a program because of its "fixed
width" properties (assuming that you can somehow deal with combining
marks, etc), but UTF-16 clearly isn't fixed width, especially now that
Unicode and 10646 are about to actually assign characters beyond U+FFFF.

The kind of analysis I have in mind is one that lists various pros and
cons for each representation. I had a quick look at the Unicode 3.0
book, but I haven't read all of it yet. Does anybody have any pointers
to such analyses, e.g. URLs, books, etc?



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT