Re: UTF-8 vs UTF-16 as processing code

Date: Fri Jun 16 2000 - 16:53:30 EDT

We currently have an application that is distributed world-wide. Our
Australian office needed CJK capability so they chose to make the
application use UTF-8. At the last possible point in the code before
display, it was converted to UCS2. The application is client-server with
the RDBMS engine from Sybase which supports UTF8 (to a degree) so there was
some impetus to use UTF8.

When we began to look at the requirements for Arabic, Greek, Hebrew,
Cyrillic, as well as CJK, and the support provided within NT/2000, we
decided to rework the software application to convert from UTF8 to UCS2 at
the point in the code closest to the database. Then, all of the code,
including display and file i/o use UCS2. It is working well and appears to
be a better solution than dealing with UTF8.

                    .com (Erik To: "Unicode List" <>
                    van der Poel) cc:
                                         Subject: UTF-8 vs UTF-16 as processing code
                    11:25 AM

Hi everybody,

I'm wondering if there are any analyses comparing UTF-8 with UTF-16 for
use as a processing code. UCS-2 has often been considered a good
representation to use internally inside a program because of its "fixed
width" properties (assuming that you can somehow deal with combining
marks, etc), but UTF-16 clearly isn't fixed width, especially now that
Unicode and 10646 are about to actually assign characters beyond U+FFFF.

The kind of analysis I have in mind is one that lists various pros and
cons for each representation. I had a quick look at the Unicode 3.0
book, but I haven't read all of it yet. Does anybody have any pointers
to such analyses, e.g. URLs, books, etc?



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT