From: Philippe Verdy (email@example.com)
Date: Thu Jun 22 2006 - 08:16:23 CDT
----- Original Message -----
From: "Addison Phillips" <firstname.lastname@example.org>
Sent: Thursday, June 22, 2006 5:33 AM
Subject: RE: Surrogate pairs and UTF-8
> I'm disturbed by something here.
> Pavils wrote:
>> am a developer who needs to write UTF-8 encoder and decoder in
> units. Thus U+10000 is represented by the surrogate pair 0xD800 0xDC00. The
> String class treats these as two "characters" in a String object (in methods
> such as charCodeAt() or index()).
> documents, headers, and other text sources that you are manipulating is
> is paying attention to HTTP headers and what the browser thinks the encoding
> of the JS source file or the document being read or written is. The
> exception to this is when generating URIs from strings, for which there are
> a variety of escape methods (escape, unescape, encodeURI,
> encodeURIComponent, etc.). What I'm getting at here is: there is no data
> type or methods for manipulating bytes or character encodings. There is no
> aware of to write a UTF-8 encoder or decoder (i.e. code that converts a
> String to a UTF-8 byte sequence in an object or vice versa). There are
> plenty of ways to put Strings into a UTF-8 file (or read from a UTF-8 file).
> There is usually something (else) wrong when a developer is trying to do
> Pavils: what is it you are trying to do that you think requires you to
> encode or decode UTF-8?
This archive was generated by hypermail 2.1.5 : Thu Jun 22 2006 - 08:41:52 CDT