Re: charset=utf8 and Mac mailers

Date: Mon Nov 03 2003 - 19:48:02 EST

  • Next message: Philippe Verdy: "Re: [hebrew] Re: Hebrew composition model, with cantillation marks"

     At 5:36 pm +0100 2/11/03, Lars Marius Garshol wrote:
    >> True. However, some software will ignore hyphens in charset names in
    >> order to make bad encoding declarations like "utf8" work properly. Web
    >> browsers are one example of this.
    First of all, a software which ignore the hyphens in one charset name not
    necessary ALSO ingore teh hyphens for a different charset. For example, in
    mozilla/netscape, I build in the
    which map both "iso-8859-1" and "iso88591" to "ISO-8859-1". But we didn't map
    "eucjp' to "euc-jp". You may THINK it is a "ignore hyphens" but it is really
    a extra table entries.
    Second, even a software support that, or several software support that, that
    does not mean it is a valid charset name. It only mean the software accept
    that name in additional to the valid charset names.
    John Delacour (JD@BD8.COM) wrote:

    >Yes. If there were any sort of consistency in the way charset are
    >“officially” named, it might be reasonable to stick to the letter
    >of the law, but there is not,
    There IS a consistency in the way charset are "officially" named- which is
    whatever listed under
    >and the use of “utf8” (either case)
    >is so commonly used and allowed for in all sorts of programs (cf.
    > that it would seem sensible to accept it.
    We are talking about charset value for the internet protocol here. It is a
    special narrow field of charset name. The value used by Internet protocol are
    defined by a well defined process- RFC
    2278 - IANA Charset Registration Procedures
    and have no direct relationship with charset name used by programming
    languages or operating system. Programming languages and operating system can choose
    whatever the name they want to use for charset, either adopt whatever register
    with IANA or not. But that does not mean IANA will or should take those name
    automatically. IANA will took those name if someone submit those name thorugh
    the RFC 2278 process and those name fit into the criterias stated in RFC 2278.
    there are some good reason for this. We don't want browser or other internet
    software support any charset name any software produce. We want to reduce the
    support list to a finite set in a common places that all vendor can reference
    to. A particlar Perl programmer can choose to use a particular charset name
    for Perl. That is perfectly fine for his/her Perl. But he/she should not expect
    the INTERNET developers follow his/her usage unless someone bother to go
    through the INTERNET way- RFC 2278.
    >If “l1” is
    >acceptable for “ISO-8859-1”, as it is, though it is not in
    >Apple’s TEC listing, then “utf8” etc. ought to be fairly
    >predictable anomalies.
    "L1" is accepted because it is a valid charset name listed in
     Name: ISO_8859-1:1987 [RFC1345,KXS2]
    MIBenum: 4
    Source: ECMA registry
    Alias: iso-ir-100
    Alias: ISO_8859-1
    Alias: ISO-8859-1 (preferred MIME name)
    Alias: latin1
    Alias: l1
    Alias: IBM819
    Alias: CP819
    Alias: csISOLatin1 "utf8" is not valid charset name simply because it is not
    listed under

    Frank Yung-Fong Tang
    System Architect, Iñtërnâtiônàl Dèvélôpmeñt, AOL Intèrâçtívë Sërviçes
    AIM:yungfongta Tel:650-937-2913
    Yahoo! Msg: frankyungfongtan

    John 3:16 "For God so loved the world that he gave his one and only Son, that
    whoever believes in him shall not perish but have eternal life.

    Does your software display Thai language text correctly for Thailand users?
    -> Basic Conceptof Thai Language linked from Frank Tang's
    Iñtërnâtiônàlizætiøn Secrets
    Want to translate your English text to something Thailand users can
    understand ?
    -> Try English-to-Thai machine translation at

    This archive was generated by hypermail 2.1.5 : Mon Nov 03 2003 - 20:26:09 EST