RE: NCR encode/decode, vs Unicode approach

From: Addison Phillips (addison@yahoo-inc.com)
Date: Mon Jun 19 2006 - 15:01:41 CDT

Next message: Alexej Kryukov: "Re: U+0345 COMBINING GREEK YPOGEGRAMMENI not usable in other scripts as "hook below"?"

Previous message: Huo, Henry: "NCR encode/decode, vs Unicode approach"
In reply to: Huo, Henry: "NCR encode/decode, vs Unicode approach"
Next in thread: Andreas Prilop: "RE: NCR encode/decode, vs Unicode approach"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

If you switch to a Unicode implementation, the issues you will encounter are
likely to be much less than the problems you'll have to deal with when
processing legacy (i.e. non-Unicode encoded) strings with additional layers
of encoding in them. Using a native Unicode encoding in the database, for
example, will allow you to use the actual characters you are storing in your
SQL queries or when indexing entries. Strings containing NCRs have to be
processed in various ways--requiring quite complex code to detect and handle
the NCRs. And you've already encountered variations and problems in encoding
support going down this path.

By contrast, changing your Web server to host pages using the UTF-8
encoding, recoding the pages, and possibly including a UTF-8 <meta> tag in
the header is the work of an afternoon. Migrating your database and fixing
server-side code might still be an appreciable project (it depends on how
internationalized your code is). However, when you are done, you'll actually
be done. And nearly any problems you encounter switching to UTF-8 would
equally apply to using a combination of legacy encodings and NCRs---only the
code is much easier to write.

Regards,

Addison

Addison Phillips
Internationalization Architect - Yahoo! Inc.

Internationalization is an architecture.
It is not a feature.

_____

From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
Behalf Of Huo, Henry
Sent: lundi 19 juin 2006 06:26
To: 'unicode@unicode.org'
Subject: NCR encode/decode, vs Unicode approach

We are evaluate the legacy systems, and would like to get you gurus' advises
on what's the best approach to support multilingual web products.

Currently, the legacy web applications are running on Websphere5 and Sybase
12.5 which setup with CP850 for varchar and char.

Web front-end will do NCR encoding/decoding (&#nnnnn;) for double-byte
characters, e.g. Japanese, Chinese characters, and no encode/decode for
us-ascii inputs.

We are currently working on a plan to support all kinds of language,
including English, German (umlauts), Korea, Chinese, Japanese, etc. Could
you please advise what's the best approach? If we convert the Sybase
database to use unichar/univarchar, then we need to change all of the legacy
apps to use UTF-8 encode/decode, and the efforts are huge. If we would like
to keep the current CP850 char/varchar in Sybase database site, should we
encode/decode with NCR (&#nnnnn;) for all Web applications handling
different languages cross different countries?? Will the NCR encode/decode
support all languages w/o issues --- we already noticed some issues, like
invalid characters "??" in the database.

Thank you so much for your help and any input is highly appreciated.

With best regards,

- Henry

Next message: Alexej Kryukov: "Re: U+0345 COMBINING GREEK YPOGEGRAMMENI not usable in other scripts as "hook below"?"
Previous message: Huo, Henry: "NCR encode/decode, vs Unicode approach"
In reply to: Huo, Henry: "NCR encode/decode, vs Unicode approach"
Next in thread: Andreas Prilop: "RE: NCR encode/decode, vs Unicode approach"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 19 2006 - 15:07:46 CDT