Re: PC code pages and national character sets

From: David Starner (dstarner98@aasaa.ofe.org)
Date: Mon Apr 30 2001 - 20:54:11 EDT


On Tue, May 01, 2001 at 07:49:27AM +0900, TOYOSHIMA,Masayuki wrote:
>
> > Is there a list of all PC code pages, national character sets,
> > and/or mail encodings, that area in real use today somewhere in the web?
>
> maybe not `all', but try
> http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/CJK.html

I didn't find that link helpful at all. It seems to be an arbitrary
set of ISO-2022/Unicode character tables with no real reference to
the web. It includes the 7-bit ISO-646 sets which I would be surprised
to find anywhere on the web, and is missing many Microsoft code pages
and specialized Russian code pages that probably make up half of the
top most used character sets on the Web.

What does "real use" mean? The set of usable character sets is
unbounded and hence the set of sets people use is very varied.
My guess is to look at what the main webbrowser support. For
example, Netscape 4.7 supports:

ISO-8859-1/2/5/7/9/15, Windows-1250/1251/1253, UTF-7/8, KOI8-R,
IBM 866, Shift_JIS, EUC-JP, Big5, EUC-TW, GB2312 and some
autodetected Korean and Japenese sets.

Netscape 4.7 sucks in this regard.

Mozilla has a much wider collection of character sets, including
many of the remaining ISO-8859-*, more of the Windows sets, and some
Macintosh sets, plus some misc. others (e.g. VISCII, KOI8-U), as does
Konqueror (the only one of the three to claim to support UTF-16.)
(IE hasn't been ported to my platform.) Of the Unicode encodings,
I'd say UTF-8 is by far the most common, though there are probably
some CJK pages in UTF-16.

-- 
David Starner - dstarner98@aasaa.ofe.org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT