Re: U+xxxx, U-xxxxxx, and the basics

From: Addison Phillips [GSC] (addison@globalsight.com)
Date: Mon Mar 06 2000 - 11:47:08 EST


UCS-2 is bogus because it isn't UTF-16. New implementations should not use
UCS-2, since UTF-16 is a superset that allows for the surrogate characters.
Supporting only UCS-2 will mean that your implementation breaks when Unicode
3+ characters become official and get used (which will happen quickly
because there are a bunch of additional Han characters in the first plane).
In other words: when mentioning and describing UCS-2, deprecate its use
clearly so that newcomers understand that they *need to* support UTF-16.

UTF-16 is also a requirement because there are a number of significant UCS-2
implementations by now that need to support additional characters and
re-architecting them is not an option, compared to providing a mechanism
like UTF-16 to make them conformant. Oh, I forgot, we should replumb them
all to use UTF-32 ;-)......

Addison

Addison P. Phillips
Senior Globalization Consultant
Global Sight Corporation
mailto:addison@globalsight.com
================================
101 Metro Drive, Suite 750
San Jose, California 95110
(+1) 408.350.3600 - Telephone
(+1) 408.350.3601 - Fax
http://www.globalsight.com
================================

Red Herring names Global Sight among the 1999 "Ten to Watch" in its annual
roundup of the top 100 companies of the electronic economy. Read more at:
http://www.redherring.com/mag/issue67/news-feature-du99-global.html

Going global with your web site? Global Sight provides Web-based software
solutions that simplify the process, cut costs, and save time.
----- Original Message -----
From: Dan Oscarsson <Dan.Oscarsson@trab.se>
To: Unicode List <unicode@unicode.org>
Sent: Monday, March 06, 2000 7:09 AM
Subject: Re: U+xxxx, U-xxxxxx, and the basics

> John Cowan wrote:
> >
> >> The ISO standard also defines a 16-bit encoding form called UCS-2, in
which
> >> a 16-bit code value in the code space 0x0..0xFFFF directly corresponds
to an
> >> identical scalar value, but this form is, of course, inherently limited
to
> >> representing only the first 65,536 scalar values.
> >
> >UCS-2 is bogus and shouldn't be explained before UTF-16, which has been
the
> >real deal since Unicode 2.0.
>
> Why is it bogus?
> I see UTF-16 as really bogus. UTF-16 is there (I guess) because Unicode
> suddenly realised that 16 bits were not enough and instead of
> going for full UCS tried to cram as much as possible into 16 bits by
> an encoding like UTF-8.
>
> Dan
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT