Re: GSM and Unicode

From: YTang0648@aol.com
Date: Wed Nov 05 2003 - 13:37:16 EST

Next message: YTang0648@aol.com: "Re: UTF8 and COntrol Characters"

Previous message: Guy Schockaert: "unsubscribe LISTS Fwd: Ecartis command results:"
Maybe in reply to: YTang0648@aol.com: "GSM and Unicode"
Next in thread: Philippe Verdy: "Re: [OT] HTML charset declarations (was: GSM and Unicode)"
Reply: Philippe Verdy: "Re: [OT] HTML charset declarations (was: GSM and Unicode)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

In a message dated 11/5/2003 2:59:00 AM Pacific Standard Time,
verdy_p@wanadoo.fr writes:
From: "John Delacour" <JD@BD8.COM>

> At 3:48 pm -0500 4/11/03, YTang0648@aol.com wrote:
> > In a message dated 11/4/2003 12:27:04 PM Pacific Standard Time,
> > verdy_p@wanadoo.fr writes:
> >
> >
> > GSM charsets are mostly from MES-1,etc
>
> This styled message contained (thanks to Microsoft) this line in the head:
>
> > <META charset=UTF-8 http-equiv=Content-Type content="text/html;
> > charset=utf-8">
>
> So far as I can tell, this is gibberish and ought to be
>
> > <META http-equiv="Content-Type" content="text/html; charset=utf-8">
>
> My browser seems to agree with me, but I await correction.

That's normal: in HTML 4- (but not in XML or XHTML) attributes are accepted
without quotes, in some limits. Also the letter case of attribute names
(like element names) is not significant, and in both HTML and XML the order
of attributes is never significant.

So the only strange thing in this header is the presence of the
'charset=UTF-8' extra attribute in the meta element. I don't know for which
browser or mail reader this is included, as it is normally set within the
value of the 'content' attribute when the 'http-equiv' attribute is set to
"Content-Type" (case not significant for this value). It is extremely
probable that this non standard extension 'charset' attribute name is
ignored, so the value specified for the standard 'content' attribute name
takes precedence.
I think the reason that we see page which have <meta charset=""> is because
the old charset detection code we put into Netscape 2.0 way back in early 1996
is very "loose". that code is not build into the paser but in a pre-parsing
STREAM filter. It is a simple sniffer for performance reason and it cannot be in
the parser because you need to detect, and convert the charset before hand
those data to the parser for the reason of ISO-2002-JP. Because of the loose of
that old meta charset detection code, page have those tag will ALSO gracefully
work with Netscape 2.0 till 4.x . (We are force to also make it work for
Netscape 7 and Mozilla later becasue of that I believe. MS probably do the
samething for IE because of the same reason). Although Netscape never put down any
document to advertise that, people somehow find that is shorter than the "right
way" and also it work with major browser in that time so some people start to
use it. I believe it is where all it come from . You can still find that old
code in
http://lxr.mozilla.org/classic/source/lib/libi18n/metatag.c
That code is no longer used by new Mozilla code base after 1999 nor Netscape
6 or 7.

==================================
Frank Yung-Fong Tang
System Architect, Iñtërnâtiônàl Dèvélôpmeñt, AOL Intèrâçtívë Sërviçes
AIM:yungfongta mailto:ytang0648@aol.com Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son, that
whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
Iñtërnâtiônàlizætiøn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/

Next message: YTang0648@aol.com: "Re: UTF8 and COntrol Characters"
Previous message: Guy Schockaert: "unsubscribe LISTS Fwd: Ecartis command results:"
Maybe in reply to: YTang0648@aol.com: "GSM and Unicode"
Next in thread: Philippe Verdy: "Re: [OT] HTML charset declarations (was: GSM and Unicode)"
Reply: Philippe Verdy: "Re: [OT] HTML charset declarations (was: GSM and Unicode)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 14:33:17 EST