Re: surrogate at java's property file

From: David Hopwood (david.hopwood@zetnet.co.uk)
Date: Mon Oct 01 2001 - 23:54:20 EDT


-----BEGIN PGP SIGNED MESSAGE-----

Yung-Fong Tang wrote:
> Any one know how does Java handle Surrogate pair property file ?
>
> Java's property file use the \u encoding for non ASCII characters,
> therefore U+00a5 is \u00A5. I wonder anyone know how does it handle
> Surrogate Pair?
>
> Does U+10000 (0xd800 0xdc00) encoded as "\u10000" or "\ud800\udc00" ? (I
> think it should be \u10000) or they cannot handle them at all ?

"\ud800\udc00". Java 'char's are really UTF-16 code units (that's what
the converters implement; any documentation that says UCS-2 is out of
date). It's up to applications to avoid splitting surrogates.

- --
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5 0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO7k6QDkCAxeYt5gVAQFtlQgAoqz7pnC8RrGkxdlGPZhe7hQtrvRaaoMO
7VknOBo9PaJNoOD79OHtZb4yPLNWx2fMibXf+RRC9w2Fi+G/MynLuH4jHWG4VeEB
Lxzrtkm4XoM0zFJ/E7Hnz/jRZDMl3F6uWAphA4gulVjwgWXtP2dcJOFtcNqjoQh0
GJ6LFm9U94xVptAbOQEmEACZKlsBugfelHj2CO9LwvzuTLqB4O7Tg/MG0fr6MsM8
8k5AFFMIlGe3e7RQ/U14umUSL6c6ME0SJ8APfHw6yWPriwB+CN5v73NrK8TlDFD3
tD2Robl3om7m+eJPK3006revtGgDBh47Wsi2LPlBCae2ZBVZLxrPgA==
=/XOe
-----END PGP SIGNATURE-----



This archive was generated by hypermail 2.1.2 : Tue Oct 02 2001 - 00:46:46 EDT