Re: Unicode conference papers

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Nov 23 2006 - 02:17:19 CST

Next message: Elliotte Harold: "Re: Fwd: Creative commons' license symbols"

Previous message: James Kass: "RE: Normalization in Bengali"
In reply to: Martin Duerst: "Re: Unicode conference papers"
Next in thread: Martin Duerst: "Re: Unicode conference papers"
Reply: Martin Duerst: "Re: Unicode conference papers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wed, 22 Nov 2006, Martin Duerst wrote:

>> Text encoded as UTF-8, then reinterpreted using an 8-bit encoding (often
>> Latin-1 or Windows-1252), and then re-encoded incorrectly as UTF-8 for
>> a second time.
>
> Yes. The W3C site has quite a lot of these, too, even if they are
> fortunately usually limited to single characters such as the copyright
> sign. Here's an example:
> http://www.w3.org/2001/Annotea/User/Papers.html

That page is a somewhat different case. There's more than the copyright
sign that is wrong there, namely the registered sign and two occurrences
of e with acute (in the name "José"), too. Moreover, the page says
   <?xml version="1.0" encoding="iso-8859-1"?>
_and_
   <meta http-equiv="content-type"
   content="application/xhtml+xml; charset=UTF-8" />
but what really matters is the HTTP header
   Content-Type: text/html; charset=iso-8859-1

If you manually change the encoding used by a browser to UTF-8, the é's
become right and the two other non-ASCII characters become a little less
obscured by extra characters before them. There _is_ a "double UTF-8"
involved, too, but the primary problem is that the declared encoding
is not the one actually used on the page.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Next message: Elliotte Harold: "Re: Fwd: Creative commons' license symbols"
Previous message: James Kass: "RE: Normalization in Bengali"
In reply to: Martin Duerst: "Re: Unicode conference papers"
Next in thread: Martin Duerst: "Re: Unicode conference papers"
Reply: Martin Duerst: "Re: Unicode conference papers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Nov 23 2006 - 02:19:26 CST