Re: Three new Technical Notes posted - Ada UTF-16

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Fri Jan 23 2004 - 19:38:07 EST

Next message: Murray Sargent: "Does Java 1.5 support Unicode math alphanumerics as variable names?"

Previous message: D. Starner: "Re: Three new Technical Notes posted"
In reply to: D. Starner: "Re: Three new Technical Notes posted"
Next in thread: Jungshik Shin: "Python and Unicode (was Re: Three new Technical Notes posted)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

D. Starner wrote:
>> #12 UTF-16 for Processing
>
> This is incorrect in saying that Ada uses UTF-16. It supports
> UCS-2 only. The text of the standard says:
>
> The predefined type Wide_Character is a character type
> whose values correspond to the 65536 code positions of
> the ISO 10646 Basic Multilingual Plane (BMP). [...]
>
> which doesn't include surrogate code points. The next

True, but not much different/worse than for Java, for example. Once you have 16-bit types and string
literals, adding a few functions to deal with supplementary code points is not hard. We did this for
Java in ICU4J.

There is little difference for a language between supporting UCS-2 or UTF-16 because where functions
do not handle supplementary code points, they usually also don't handle Unicode versions above 3.0 -
so string case mappings etc. are the same.

A language like that can be relatively easily upgraded to full UTF-16 handling by updating the
character and string function implementations, and adding a few new APIs - that is what Java is
doing. The upgrade is done naturally when the standard functions are extended to Unicode 3.1 or later.

As such, whether the strings contain UCS-2 or UTF-16 depends less on the language definition and
more on the functions that are used, and the version of the standard libraries.

> version of Ada will have 32-bit characters to fully
> support Unicode - the text of the proposal is here:
>
> <http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-00285.TXT?rev=1.14>
>
> plus lengthy discussion on the issues.

Thank you very much for the link.

The proposal seems to be to continue to treat Wide strings as UCS-2, and to treat Wide_Wide strings
(a new type) as UTF-32. This would give Ada a total of three different native string types on the
language level. It would also mean that existing code, using 16-bit strings, would not benefit from
an upgrade but would instead have to be rewritten for support of supplementary code points. This may
in fact slow down such support.

There will be a presentation of the choices for Java (including UTF-32) at IUC 25.

Best regards,
markus

-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.

Next message: Murray Sargent: "Does Java 1.5 support Unicode math alphanumerics as variable names?"
Previous message: D. Starner: "Re: Three new Technical Notes posted"
In reply to: D. Starner: "Re: Three new Technical Notes posted"
Next in thread: Jungshik Shin: "Python and Unicode (was Re: Three new Technical Notes posted)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 23 2004 - 21:22:41 EST