Handling of Surrogates

From: Sam Mason (sam@samason.me.uk)
Date: Thu Apr 16 2009 - 14:04:07 CDT

  • Next message: Asmus Freytag: "Re: Handling of Surrogates"

    Hi All,

    I've got myself in a discussion about the correct handling of surrogate
    pairs. The background is as follows; the Postgres database server[1]
    currently assumes that the SQL it's receiving is in some user specified
    encoding, and it's been proposed that it would be nicer to be able to
    enter Unicode characters directly in the form of escape codes in a
    similar form to Python, i.e. support would be added for:

      '\uxxxx'
    and
      '\Uxxxxxxxx'

    The currently proposed patch[2] specifically handles surrogate pairs
    in the input. For example '\uD800\uDF02' and '\U00010302' would be
    considered to be valid and identical strings containing exactly one
    character. I was wondering if this should indeed be considered valid or
    if an error should be returned instead.

    -- 
      Sam  http://samason.me.uk/
     [1] http://www.postgresql.org/
     [2] http://archives.postgresql.org/pgsql-hackers/2009-04/msg00904.php
    


    This archive was generated by hypermail 2.1.5 : Thu Apr 16 2009 - 14:10:00 CDT