"AUFDERHEIDE HARRY R. (app1hra)" wrote:
> 1. Is the UTF-8's character set equal to the Latin-1 (ASCII) Code Page's?
No. UTF-8 and UTF-16 support the exact same repertoire of 41,000+ characters,
a superset of essentially every character set now in use.
> If not, what are the differences?
Latin-1 uses a single byte per character and encodes 256 characters.
UTF-8 uses 1 to 4 bytes per character, depending on the character, and encodes
all of Unicode's repertoire. Since all characters in the ASCII repertoire
use a single byte, UTF-8 is upward compatible with ASCII, but *not* with Latin-1
> Under the assumption that it is substantially the same; I don't see
> it solving our problems
> as we are currently processing more characters than this can
> support. It certainly doesn't
> appear a solution for handling Chinese, Japanese, etc.
> This leads me to the UTF-16 format with its double byte capability.
In UTF-16, essentially all characters are supported in 2 bytes each. Some
not-yet-assigned characters will require two consecutive 2-byte codes.
These special codes ("surrogates") are assigned from a range that does not
conflict with normal characters.
> What about "C" languages?
There are excellent libraries freely available for C/C++. Java has built-in
> What else should we be aware of?
Lots, see http://www.unicode.org
Schlingt dreifach einen Kreis um dies! || John Cowan <firstname.lastname@example.org> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT