From: Addison Phillips (firstname.lastname@example.org)
Date: Thu Jun 22 2006 - 09:46:27 CDT
> and it's perfectly possible (and in fact easy) to convert
> between the natural Javscript encoding, as seen in
> string.length(), string.codeCharAt(), or string.indexOf(...),
> and UTF-8.
attempt to un-mojibake it... except please note what I said:
> There is usually something (else) wrong when a developer is
That is, yes, one can attempt to fix one's data that way (it does rely on
the content being interpreted as 8859-1--and if there are bytes in the 0x80
through 0x9F range a lot of user-agents are going to interpret that as
windows-1252, even if it is labelled as 8859-1).
> used to generate documents, or some responses to servers
> handling only the UTF-8 encoding in some specific protocol
> (for example when you need to compute binary signatures, or
> the encoded length in some part of this protocol).
Uh... why not assemble a String containing the text and then set the
Content-Type of the document to the desired encoding (UTF-8 in this case)?
Assembling documents in UTF-8 via manual conversion is not necessary. And it
is prone to error.
> will be preserved on output when it is sent to a stream using
> a charset using a charset not completely covering the UCS.
> encoding of strings to another encoding is performed by the
> stream object, according to its settings properties, for
> example a HTTP or MIME message object where you can set the
> charset used for encoding/decoding their stream).
Yes. That's exactly what I said. Hence: what are you writing an
Internationalization Architect - Yahoo! Inc.
Internationalization is an architecture.
It is not a feature.
> -----Original Message-----
> From: Philippe Verdy [mailto:email@example.com]
> Sent: jeudi 22 juin 2006 06:16
> To: Addison Phillips; firstname.lastname@example.org
> Subject: Re: Surrogate pairs and UTF-8
This archive was generated by hypermail 2.1.5 : Thu Jun 22 2006 - 10:13:26 CDT