Re: Handling of Surrogates

From: Mark Davis (mark.edward.davis@gmail.com)
Date: Fri Apr 17 2009 - 19:21:25 CDT

  • Next message: Doug Ewell: "Re: Handling of Surrogates"

    That document originated in, as I recall, a personal contribution with an
    eye to IETF protocols. It doesn't particularly reflect established usage,
    nor does it go into particular depth or breadth. It's worth looking at, but
    I wouldn't take it as definitive.

    Mark

    On Fri, Apr 17, 2009 at 16:12, Doug Ewell <doug@ewellic.org> wrote:

    > James Cloos <cloos at jhcloos dot com> wrote:
    >
    > Sam> people seemed to prefer the familiarity of the Python style
    >> Sam> (i.e. \u and \U).
    >>
    >> Since you had mentioned python in the original post, I took a look at
    >> their docs. In python, using \uXXXX\uXXXX for surogates is explicitly
    >> supported.
    >>
    >> That said, I also prefer the perl style \x{X...} escape. With that,
    >> one can still use \xXX for an octet, or with braces for a UCS char.
    >>
    >
    > If this is going to morph into a thread about what escape styles people
    > *like* rather than how they are defined, it might be worth looking at RFC
    > 5137 (BCP 137), "ASCII Escaping of Unicode Characters":
    >
    > http://www.rfc-editor.org/rfc/rfc5137.txt
    >
    > This document examines many different styles and their advantages and
    > disadvantages, and loosely categorizes them into "recommended" and "normally
    > not recommended" camps. This might provide more insight into the problem
    > and solutions than having lots of people chip in with their favorite syntax.
    >
    >
    > --
    > Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
    > http://www.ewellic.org
    > http://www1.ietf.org/html.charters/ltru-charter.html
    > http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Apr 17 2009 - 19:22:53 CDT