Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

From: Antoine Leca (
Date: Wed Feb 21 2001 - 09:35:03 EST

Cher Ken,

Kenneth Whistler wrote:
> This was also subjected to a major revision in the just-completed UTC meeting.
> These actions were taken to make it clear to everyone that use of a 32-bit
> encoding form is *not* inconsistent with a claim of compliance to the Unicode
> Standard, now that UTF-32 has been officially added as a sanctioned encoding
> form. From this date forward, no one should have to jump through hoops to
> explain how their 32-bit wide character implementations are and are not
> conformant to the Unicode Standard.

As I said before, I am very happy with this news. I hope we can now construct
some usables things with the Unix community, for example basing itself on
the __STDC_ISO_10646__ predefined symbol, or any other mechanism if this one
proves inadequate.
> Antoine Leca said:
> > wrote:
> > >
> > > Eh? Unicode has no aversion to either a 32-bit encoding form (UTF-32 - see
> > > UTR#19 or PDUTR#27) or with C++.
> >
> > Read also TUS3.0, par. 5.2 on top of page 108...
> > As far as I know, neither UAX-29 nor PDUTR-27 has changed these words...
> >
> > That said, one can see it as a overview that ought to be corrected.
> > As the guy that fighted to introduce the most wide uses of ISO10646/Unicode
> > in C99, I will certainly welcome any change in this area! ;-)
> >
> All taken care of in the rewrite of section 5.2, based on the last
> UTC meeting's review of the text of PDUTR #27.

Again, good news.

> In general, folks, please calm down a little.

Well, since this remark appears just below mine, I feel I should just
point out some points. First, I am (was, in fact) very calm. I participate
in this mailing list in the only hope to provide input to get a better
standard, usable by the most people and with the little inconstencies
as possible. I felt that contributions here have to be welcome when they
are technically compeling, even if they are badly phrased (but most writers
are not English natives) or even if the tone is sometimes ironic (but it
should not be offensive, don't misunderstood me).

This very sentence about 32-bit encoding not being suitable for use in
wchar_t, seems to me from the beginning (I read it in TUS 1.0 IIRC, and back
at this time I found it _very_ strange, even if I had very limited knowledge
of Unicode at that time), seems to me like a unnecessary bashing written by
some-unknown-promoter-of-Unicode toward some-other-unknown-category-of-people,
(whether it aims C or not I do not know; but I note I never saw Unicode
representatives participating at the C99 process, even if it was a good
option to promote the use of Unicode/10646 as a first-class option in the
encoding of characters when using the C language).

The wchar_t stuff is standardized from 1994 on, and it exists in spirit for
a couple of additional years. If the (now former) Unicode view (storing as
16-bit always) had gained any point on this battle, this would now be known
for a couple of years. It appears it failed. As a result, the relevant
paragraph is really an relic from the past that ought to be dropped (and
the UTC rightly just did so). So my comment arrived a _bit_ too late:
one day in about five years. That is no reason to point me as an
irresponsible which needs to calm down.

> The text of PDUTR #27 is out-of-date

Today (after your post and Peter's) it is really out of date. I do not
believe it was out-of-date yesterday morning, unless there is a new
version that I missed, or unless I _should_ have known that I had no
right to mention it on this very list.

> -- it was a *Proposed Draft*, after all, for review by the UTC.

Does it mean that we "guests" are not expected to have any comments on
this proposition?
If yes, what is the point to publish them and to mention them on the list?

> And the editorial committee has been working furiously to update
> the text for final posting.

OK, can I ask you to also please calm down? Nobody is perfect, and
many contributors of this thread only did make some points (sometimes
ironically) in order to get the Standard better. Most if not all will
also approve the quality of the work of the UTC, even if sometimes they
are not fully satisfied with its resolutions. I certainly are the first
to approve. The real point that I did not notice that an UTC meeting
was taking place this week, so it was a matter of minutes ;-) to have
the things corrected. I knew that, I probably would have written my
comment slightly differently (asking more loudly for this point to be
considered as an overview).

> I cannot promise that all issues will be resolved and all truth will
> be revealed in that document, but much of what has been discussed on
> this thread should become moot.

Good. So what is the point?



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT