Re: Surrogate pairs and UTF-8

From: Pavils Jurjans (
Date: Thu Jun 22 2006 - 16:20:40 CDT

  • Next message: Eric: "The best tabs f0r men health by l0wer prices!"

    Mike, I updated the code so that the Firefox displays the results in the
    gray boxes. However, it behaves a bit oddly. While it shows only one
    character in the textbox, if I position cursor at the end of the text, and
    then click backspace, it removes not both units of surrogate pair, but only
    the low surrogate.

    On 6/22/06, Edward Trager <> wrote:

    > ... Correct me if I am missing something:
    > AJAX frameworks presumably have no problem whatsover transferring data
    > directly in UTF-8 format. UTF-8 is the default encoding for XML. So, once
    > the data get to the client, all one has to do is parse the UTF-8 strings
    > directly out of the XML (assuming AJAX based on XMLHttpRequest) and wrap
    > them
    > inside of some XHTML tags for display. Where is the need to escape
    > strings
    > in XML? UTF-8 can encode all Unicode points.

    The problem lies in the fact that if you want to save string data in XML
    format, you can't just do [textNode.value = stringData] and assume that all
    the odd control characters will pass through, when the XML file is
    transferred, using UTF-8 encoding. It's even worse with XML attribute
    values. So, the string data needs escaping. At this point, one has to decide
    what escaping to use - whatever escaping will do, because the server end can
    just do the opposite. However, since we talk about client side JavaScript
    here, it better be some built-in function, otherwise large strings will need
    considerable time to be processed. Also, it's nice to stick to some
    standards. There kicks in the wonderful function encodeURIcomponent.
    However, there are older browsers that don't support that function,
    therefore we need to simulate it. Hence the need to have JS-based UTF-8


    This archive was generated by hypermail 2.1.5 : Thu Jun 22 2006 - 17:12:28 CDT