RE: Handling of Surrogates

From: Murray Sargent (murrays@exchange.microsoft.com)
Date: Thu Apr 16 2009 - 15:21:44 CDT

Next message: Asmus Freytag: "Re: Handling of Surrogates"

Previous message: Peter Zilahy Ingerman, PhD: "Re: Localizable Sentences Experiment"
In reply to: Sam Mason: "Handling of Surrogates"
Next in thread: Bjoern Hoehrmann: "Re: Handling of Surrogates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The best approach would be to use UTF-8, but if you're going to use an escape code, by all means use the longer (UTF-32) form \Uxxxxx or\Uxxxxxx (if need be) rather than a surrogate pair. The UTF-32 codes are listed directly in The Unicode Standard 5.0 and are considerably easier to read. The main benefit of UTF-16 is for saving internal memory.

Murray

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Sam Mason
Sent: Thursday, April 16, 2009 12:04 PM
To: unicode@unicode.org
Subject: Handling of Surrogates

Hi All,

I've got myself in a discussion about the correct handling of surrogate
pairs. The background is as follows; the Postgres database server[1]
currently assumes that the SQL it's receiving is in some user specified
encoding, and it's been proposed that it would be nicer to be able to
enter Unicode characters directly in the form of escape codes in a
similar form to Python, i.e. support would be added for:

'\uxxxx'
and
'\Uxxxxxxxx'

The currently proposed patch[2] specifically handles surrogate pairs
in the input. For example '\uD800\uDF02' and '\U00010302' would be
considered to be valid and identical strings containing exactly one
character. I was wondering if this should indeed be considered valid or
if an error should be returned instead.

--
  Sam  http://samason.me.uk/
 [1] http://www.postgresql.org/
 [2] http://archives.postgresql.org/pgsql-hackers/2009-04/msg00904.php

Next message: Asmus Freytag: "Re: Handling of Surrogates"
Previous message: Peter Zilahy Ingerman, PhD: "Re: Localizable Sentences Experiment"
In reply to: Sam Mason: "Handling of Surrogates"
Next in thread: Bjoern Hoehrmann: "Re: Handling of Surrogates"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Apr 16 2009 - 15:24:34 CDT