Date: Mon Nov 03 2003 - 19:48:02 EST
At 5:36 pm +0100 2/11/03, Lars Marius Garshol wrote:
>> True. However, some software will ignore hyphens in charset names in
>> order to make bad encoding declarations like "utf8" work properly. Web
>> browsers are one example of this.
First of all, a software which ignore the hyphens in one charset name not
necessary ALSO ingore teh hyphens for a different charset. For example, in
mozilla/netscape, I build in the
which map both "iso-8859-1" and "iso88591" to "ISO-8859-1". But we didn't map
"eucjp' to "euc-jp". You may THINK it is a "ignore hyphens" but it is really
a extra table entries.
Second, even a software support that, or several software support that, that
does not mean it is a valid charset name. It only mean the software accept
that name in additional to the valid charset names.
John Delacour (JD@BD8.COM) wrote:
>Yes. If there were any sort of consistency in the way charset are
>“officially” named, it might be reasonable to stick to the letter
>of the law, but there is not,
There IS a consistency in the way charset are "officially" named- which is
whatever listed under
>and the use of “utf8” (either case)
>is so commonly used and allowed for in all sorts of programs (cf.
>Encode.pm) that it would seem sensible to accept it.
We are talking about charset value for the internet protocol here. It is a
special narrow field of charset name. The value used by Internet protocol are
defined by a well defined process- http://www.faqs.org/rfcs/rfc2278.html RFC
2278 - IANA Charset Registration Procedures
and have no direct relationship with charset name used by programming
languages or operating system. Programming languages and operating system can choose
whatever the name they want to use for charset, either adopt whatever register
with IANA or not. But that does not mean IANA will or should take those name
automatically. IANA will took those name if someone submit those name thorugh
the RFC 2278 process and those name fit into the criterias stated in RFC 2278.
there are some good reason for this. We don't want browser or other internet
software support any charset name any software produce. We want to reduce the
support list to a finite set in a common places that all vendor can reference
to. A particlar Perl programmer can choose to use a particular charset name
for Perl. That is perfectly fine for his/her Perl. But he/she should not expect
the INTERNET developers follow his/her usage unless someone bother to go
through the INTERNET way- RFC 2278.
>If “l1” is
>acceptable for “ISO-8859-1”, as it is, though it is not in
>Apple’s TEC listing, then “utf8” etc. ought to be fairly
"L1" is accepted because it is a valid charset name listed in
Name: ISO_8859-1:1987 [RFC1345,KXS2]
Source: ECMA registry
Alias: ISO-8859-1 (preferred MIME name)
Alias: csISOLatin1 "utf8" is not valid charset name simply because it is not
listed under http://www.iana.org/assignments/character-sets
Frank Yung-Fong Tang
System Architect, Iñtërnâtiônàl Dèvélôpmeñt, AOL Intèrâçtívë Sërviçes
AIM:yungfongta mailto:email@example.com Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
John 3:16 "For God so loved the world that he gave his one and only Son, that
whoever believes in him shall not perish but have eternal life.
Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
Want to translate your English text to something Thailand users can
-> Try English-to-Thai machine translation at
This archive was generated by hypermail 2.1.5 : Mon Nov 03 2003 - 20:26:09 EST