Re: Entering data in unicode

Date: Tue Oct 03 2000 - 17:16:03 EDT


1. "Unicode" is not always 16-bits. There are several
encodings. The encoding used on the web is usually UTF-8, which is a
multi-byte, 8-bit encoding. You should NOT send UTF-16/UCS-2 (that's the
16-bit variety of Unicode) to a browser, because none of the major
browsers will understand (actually, that's a simplification...)

2. If you set the charset for a page to a specific encoding, say UTF-8,
then, unless the user physically changes the encoding using their View
menu (rendering the display illegible unless it's in English), that's the
encoding that you get back from the browser.

You don't have to do anything. The browser handles the text conversion
from the user's input character set to Unicode for you.

This is also how Japanese users, for example, can use a web site like
Yahoo Japan (which is encoded as EUC-JP, whereas most Japanese PC's and
Macs use the character set commonly referred to as Shift-JIS... ).

In other words: you don't have to do anything. The browser and operating
system do it all for you. The user will never be aware that their input is
being converted to Unicode unless they look at the source of the HTML page
and see the META tag. All you have to do is pick up the results on the
server side.

Note that, unless you are using a Unicode encoding, you will have to
change your character parsing algorithm for each and every character set
(and thus language) that you intend to support. And you won't be able to
store that data in the same database with data created in another language
(with some obvious exceptions to that rule). Unicode solves a whole bunch
of problems on the server side.

I urge you to get a copy of the "Unicode 3.0" book and Ken Lunde's
excellent "CJKV Information Processing", both of which explain goodly
chunks of this. And check out the internationalization section of the website.

Best Regards,


Addison P. Phillips Principal Consultant
Inter-Locale LLC
Los Gatos, CA, USA

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
Globalization Engineering & Consulting Services

On Tue, 3 Oct 2000, George Zeigler wrote:

> Hello,
> I would like to understand something. If I do havea site in unicode, how
> do I get people to enter data in unicode? We can test for the 16 bits, but I
> would not know how to instruct someone to enter data specifically in unicode
> character set.
> George

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT