Re: Surrogate pairs and UTF-8

From: Pavils Jurjans (
Date: Thu Jun 22 2006 - 12:35:52 CDT

  • Next message: Edward Trager: "Re: Surrogate pairs and UTF-8"

    Ok, group, so here's the fruit, I make it publis for the benefit of all:

    The page contains both encoder and decoder in JavaScript. I belive the
    implementation is correct, however I have not mass-tested it with all those
    fancy antique scripts.

    As I already said, the need to do this in JavaScript comes from necessity to
    transfer data to and from server in AJAX-based framework. That is, submit
    and receive complex data without page refresh. It is natural with to create
    XML format packages for those data, and when it comes to transferring any
    kind of strings, one needs to escape them somehow so that all codepoints
    pass through the XML format. I chose to follow the encoding that is provided
    by function encodeURIComponent(), and I just needed to rewrite it in
    JavaScript, to support older browsers. So I did that. To be sincere, I can
    not think of any alternative method that would allow total unicode support
    for transferring string data, together with oter complex and typed data like
    dates, booleans and regular expressions.

    Addison, you should think about JavaScript in wider context than just web.
    The ECMA script is supported in very many environments, and whenever the
    talk goes about creating files and/or transferring data to server, some
    hand-coded encoding sequences may come handy.


    Pavils Jurjans

    This archive was generated by hypermail 2.1.5 : Thu Jun 22 2006 - 13:02:28 CDT