Re: Abstract character?

From: Doug Ewell (dewell@adelphia.net)
Date: Tue Jul 23 2002 - 01:05:12 EDT

Previous message: Lisa Moore: "Dublin Conference: Re: ISO/IEC 10646 versus Unicode"
In reply to: Mark Davis: "Re: Abstract character?"
Next in thread: David Hopwood: "Re: Abstract character?"
Next in thread: Lars Marius Garshol: "Re: Abstract character?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark Davis <mark at macchiato dot com> wrote:

> The UTC in has decided to make scalar value mean unambiguously the
> code points 0000..D7FF, E000..10FFFF, i.e., everything but surrogate
> code points. While surrogate code points cannot be represented in
> UTF-8 (as of Unicode 3.2), the UTC has not decided that the surrogate
> code points are illegal in all UTFs; notably, they are legal in
> UTF-16.

They are not legal in UTF-16 unless you believe that the two code points
(0xD800, 0xDC00) are fundamentally equivalent to the single code point
0x10000 -- that is, unless you believe Unicode *is* UTF-16.

UTF-16 does not allow the representation of an unpaired surrogate 0xD800
followed by another, coincidental unpaired surrogate 0xDC00. (It maps
the two to U+10000.) Among the standard UTFs, only UTF-32 allows the
two to be treated as unpaired surrogates. In fact, before UTF-8 was
"tightened up" in 3.2, the only UTF that DID NOT permit these two
coincidental unpaired surrogates was UTF-16.

UTF-8: D800 DC00 <==> ED A0 80 ED B0 80 (no longer legal)
UTF-32: D800 DC00 <==> 0000D800 0000DC00
- but -
UTF-16: D800 DC00 ==> D800 DC00 ==> 10000

> Ken is pushing for this change; I believe it would be a very bad idea.
> (I think the reasons have already appeared on this list, so I am not
> trying to reopen the discussion; just state the current situation.)

I don't recall seeing the reasons conclusively discussed on this list;
I'd be happy to hear them again. I've been complaining about the
paragraph after D29 for two years now.

-Doug Ewell
Fullerton, California

Previous message: Lisa Moore: "Dublin Conference: Re: ISO/IEC 10646 versus Unicode"
In reply to: Mark Davis: "Re: Abstract character?"
Next in thread: David Hopwood: "Re: Abstract character?"
Next in thread: Lars Marius Garshol: "Re: Abstract character?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Jul 22 2002 - 23:25:58 EDT