Robert A Rosenberg wrote:
> Does the rest of the industry ALSO use the appearance of characters in
> x81-x9F codepoint range as an indication to tag as windows-1252? Use of
> CHARACTER in this range (on a Windows Machine [as opposed to a
> should be enough to trigger this tagging since the code is NOT ISO-8859-1
> (no matter what MS claims).
I think most do. The only problems I've encountered are from MS, when the
usual behavior is to accept the claim that the document is encoded as
8859-1 and assume that it's 1252 if any C1 characters are encountered.
Conversely, outbound data uses 8859-1, or 8859-15 if the Euro's present -
I've never heard of anyone using the extra French and Finnish characters,
but I suppose that would trigger it too. If there's too much from 1252's C1
range that precludes use of an 8859 part, then they drop into UTF-8 or use
windows-1252, depending on configuration parameters.
While the purists object to the use of ANY charset which contains graphics
in C1, I think it's perfectly acceptable, as long as the recipient has a
fighting chance of interpreting it. What's supremely frustrating is the
fact that the data claims it's 8859-1 when it's really 1252.
A related issue is the encoding of half-width kana in ISO-2022-JP, which
does not define such an encoding. Of course, we've learned to live with
this behavior, but it would be nice if MS actually tagged the data as
something else when doing something non-standard.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT