The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Tue Sep 02, 2014 9:14 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 4 posts ] 
Author Message
 Post subject: UTF-32 conformantly bound to a wchar_t in C and C++
PostPosted: Sat Sep 29, 2012 7:37 am 
Offline

Joined: Sat Aug 06, 2011 9:02 am
Posts: 43
On Chapter 2 : General Structure, at page 28, under Comparison of the Advantages of UTF-32, UTF-16, and UTF-8, one can read :

"On the face of it, UTF-32 would seem to be the obvious choice of Unicode encoding forms for an internal processing code because it is a fixed-width encoding form. It can be conformantly bound to the C and C++ wchar_t, which means that such programming languages may offer built-in support and ready-made string APIs that programmers can take advantage of."

I don't understand how a 32 bit code can be conformantly bound to a wchar_t in C and C++, as sizeof(wchar_t) = 16.


Top
 Profile  
 
 Post subject: Re: UTF-32 conformantly bound to a wchar_t in C and C++
PostPosted: Wed Oct 03, 2012 10:53 am 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
Where does it specify that wchar_t is 16 bits?


Top
 Profile  
 
 Post subject: Re: UTF-32 conformantly bound to a wchar_t in C and C++
PostPosted: Thu Oct 04, 2012 7:57 pm 
Offline

Joined: Sat Aug 06, 2011 9:02 am
Posts: 43
asmus

Of course I was referring to Microsoft compilers where sizeof(wchar_t) = 2 bytes = 16 bits,

It is clear to me that the statement that I reproduced above in the Unicode doc is in explicit contradiction with this

"The width of wchar_t is compiler-specific and can be as small as 8 bits. Consequently, programs that need to be portable across any C or C++ compiler should not use wchar_t for storing Unicode text. The wchar_t type is intended for storing compiler-defined wide characters, which may be Unicode characters in some compilers."

If my understanding is not correct please let me know why, so I can improve to meet your high standards,

Thanks


Top
 Profile  
 
 Post subject: Re: UTF-32 conformantly bound to a wchar_t in C and C++
PostPosted: Thu Oct 04, 2012 8:33 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
There's a difference between "conformantly" and "portably".

Your second quote points out that binding UTF-32 to wchar_t is not portable. Nobody disagrees with that. That binding is also a decision by the compiler vendor, and not usually something the user of the compiler (C++ programmer) can configure.

The documentation in the Unicode Standard has to cover anybody using or implementing the standard - that includes compiler vendors adding Unicode support to their compilers - and yes, they could bind their wchar_t support to UFT-32 and still be conformant to the C/C++ standards.

However, I believe the original text passage you cite from the Unicode Standard does not recommend simply to bind UTF-32 to wchar_t.
Quote:
On the face of it, UTF-32 would seem to be the obvious choice ...


(emphasis added).

That clearly implies that, despite first appearances, some other choices would most likely be better, so there's no actual contradiction between the text snippet you quote from chapter 2 and best implementation practices.

I appreciate your careful reading of the standard, but my take is that this is just an introductory sentence to a longer discussion and isn't as problematic as it may appear.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com