Re: Surrogate points

From: Hans Aberg (haberg@math.su.se)
Date: Mon Jan 31 2005 - 12:24:23 CST

Next message: Chris Jacobs: "Re: Arabic and HTML"

Previous message: Hans Aberg: "Re: Surrogate points"
Maybe in reply to: Hans Aberg: "Surrogate points"
Next in thread: Peter Kirk: "Re: Surrogate points"
Reply: Peter Kirk: "Re: Surrogate points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 11:09 +0000 2005/01/31, Peter Kirk wrote:
>Doug, I think you have missed Hans' point, ...

Right.

>...which is surely that if
>Unicode had been designed from the start as a 21-bit space or whatever,
>it is unlikely that this surrogate pair mechanism would have been used
>to encode characters beyond the first 64K, and there would not have
>been a need to reserve this large block of code points.

Well, you are closing in. There is still no need of reserving the
"surrogate" and 0xFFFE-0xFFFF points, even in the face of the UTF-16: Just
put them somewhere else in a modified UTF-16. As nobody expects all the
UTF-16 range to be covered by Unicode character numbers with a good margin,
just put them somewhere where expected to be free.

>So, Hans, all of this is theoretical as Doug has made clear. Even if we
>can all agree post facto on an improved encoding, there is far too much
>investment in UTF-16 for it ever to be changed. And UTF-16, which cannot
>be deprecated, requires these code points to be reserved. But there is
>no shortage of code points, so what's the problem?

A relatively minor change to the UTF-16 would make that condition to go
away. The current UTF-16 implementations would merely need to be aware of
that these character numbers may be used, and become altered appropriately.
This is not an urgent change as these character numbers will not be filled
very soon.

And UTF-16, even though heavily invested in, is in this respect no different
from all the ISO-Latin etc encodings. So it is easy to still introduce a new
encoding scheme, that eventually will replace UTF-16.

The problem is that repliers do not want a change, not that a change cannot
be made. At the same time, the character set of Unicode is so complicated
that a successor will have to be developed eventually. It is in dire need of
all help it can get in order to be straightened out and simplified. All you
are saying is that this is not going to happen within the scope of the
Unicode consortium, but someone else, more qualified, should do it.

Hans Aberg

Next message: Chris Jacobs: "Re: Arabic and HTML"
Previous message: Hans Aberg: "Re: Surrogate points"
Maybe in reply to: Hans Aberg: "Surrogate points"
Next in thread: Peter Kirk: "Re: Surrogate points"
Reply: Peter Kirk: "Re: Surrogate points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 31 2005 - 12:26:27 CST