Re: Handling of Surrogates

From: Doug Ewell (
Date: Fri Apr 17 2009 - 18:12:52 CDT

  • Next message: Doug Ewell: "Re: more dingbats in plain text"

    James Cloos <cloos at jhcloos dot com> wrote:

    > Sam> people seemed to prefer the familiarity of the Python style
    > Sam> (i.e. \u and \U).
    > Since you had mentioned python in the original post, I took a look at
    > their docs. In python, using \uXXXX\uXXXX for surogates is explicitly
    > supported.
    > That said, I also prefer the perl style \x{X...} escape. With that,
    > one can still use \xXX for an octet, or with braces for a UCS char.

    If this is going to morph into a thread about what escape styles people
    *like* rather than how they are defined, it might be worth looking at
    RFC 5137 (BCP 137), "ASCII Escaping of Unicode Characters":

    This document examines many different styles and their advantages and
    disadvantages, and loosely categorizes them into "recommended" and
    "normally not recommended" camps. This might provide more insight into
    the problem and solutions than having lots of people chip in with their
    favorite syntax.

    Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14  ˆ

    This archive was generated by hypermail 2.1.5 : Fri Apr 17 2009 - 18:16:19 CDT