Re: Support for non-BMP characters

From: Juanma Barranquero <lekktu_at_gmail.com>
Date: Wed, 25 Apr 2012 15:55:27 +0200

On Wed, Apr 25, 2012 at 10:31, David Starner <prosfilaes_at_gmail.com> wrote:

> While Ada 2005 added a UTF-32 string type it left the UCS-2
> string type as is.

Ada 2012 is adding (quoting from the ARM):

A.4.11 String Encoding

{AI05-0137-2} Facilities for encoding, decoding, and converting
strings in various character encoding schemes are provided by packages
Strings.UTF_Encoding, Strings.UTF_Encoding.Conversions,
Strings.UTF_Encoding.Strings, Strings.UTF_Encoding.Wide_Strings, and
Strings.UTF_Encoding.Wide_Wide_Strings.

[...]

{AI05-0137-2} {AI05-0262-1} The type Encoding_Scheme defines encoding
schemes. UTF_8 corresponds to the UTF-8 encoding scheme defined by
Annex D of ISO/IEC 10646. UTF_16BE corresponds to the UTF-16 encoding
scheme defined by Annex C of ISO/IEC 10646 in 8 bit, big-endian order;
and UTF_16LE corresponds to the UTF-16 encoding scheme in 8 bit,
little-endian order.

{AI05-0137-2} The subtype UTF_String is used to represent a String of
8-bit values containing a sequence of values encoded in one of three
ways (UTF-8, UTF-16BE, or UTF-16LE). The subtype UTF_8_String is used
to represent a String of 8-bit values containing a sequence of values
encoded in UTF-8. The subtype UTF_16_Wide_String is used to represent
a Wide_String of 16-bit values containing a sequence of values encoded
in UTF-16.

{AI05-0137-2} {AI05-0262-1} The BOM_8, BOM_16BE, BOM_16LE, and BOM_16
constants correspond to values used at the start of a string to
indicate the encoding.

etc.

    Juanma
Received on Wed Apr 25 2012 - 08:58:18 CDT

This archive was generated by hypermail 2.2.0 : Wed Apr 25 2012 - 08:58:29 CDT