Re: string vs. char [was Re: Java and Unicode]

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Mon Nov 20 2000 - 13:09:24 EST


Please keep in mind my sidepoint:

> Antoine Leca wrote:
>
> > Please note that I left aside UTF-16, because I am not clear
> > if 16-bit are adequate or not to code UTF-16 in wchar_t (in other words, if
> > wchar_t can be a multiwide encoding).

Marco Cimarosti wrote [with minor editing to keep it to the point]:
>
> > > wchar_t * _wcschr_32(const wchar_t * s, wint_t c);
> > > wchar_t * _wcsrchr_32(const wchar_t * s, wint_t c);
> > > size_t _wcrtomb_32(char * s, wint_t c, mbstate_t * mbs);
> >
> > What is the point?
>
> But if «c >= 0x1000[0]», then the character would be represented in «s» (an
> UTF-16 string) by a surrogate pair, and the function would thus return the
> address of the *high surrogate*.
>
> E.g., assuming that «s» is «{0x2190, 0xD800, 0xDC05, 0x2192, 0x0000}» and
> «c» is 0x10[0]05, both functions would return «&s[1]»: the address of the high
> surrogate 0xD800.

As I said, I am unsure UTF-16 is legal for wchar_t. If it is, I will agree with
you. But the main point of the people that say "UTF-16 is illegal for wchar_t"
is just this one: there are some case that are not handled nicely by the current
API.

 
Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT