Re: 32'nd bit & UTF-8

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Fri Jan 21 2005 - 02:42:51 CST

Next message: Antoine Leca: "Re: 32'nd bit & UTF-8"

Previous message: Kenneth Whistler: "Re: Subject: Re: 32'nd bit & UTF-8"
Maybe in reply to: Hans Aberg: "32'nd bit & UTF-8"
Next in thread: Clark Cox: "Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Reply: Clark Cox: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Hans Aberg
Sent: 20 January 2005 20:47
To: Antoine Leca; unicode@unicode.org
Subject: Re: 32'nd bit & UTF-8

> That already seems to have happened with GNU GCC, which fixes wchar_t to
> 32-bits.

and Microsoft Wisual C++, which fixes wchar_t to SIXTEEN bits.

The existence of wchar_t does not imply UTF-32. It does imply UTF-16. It does
not even imply Unicode. It's just a type. I quote from a version of stddef.h I
found on the internet somewhere:

"wchar_t: Integer type whose range of values can represent distinct
wide-character codes for all members of the largest character set specified
among the locales supported by the compilation environment: the null character
has the code value 0 and each member of the portable character set has a code
value equal to its value when used as the lone character in an integer
character constant."

Width is not specified, nor is encoding, nor is character set.

gcc is but one platform. There are others.

>> Hmmm... I don't recall that the Unicode Standard ever specifies that the
>> Byte Order Mark is *required* to be used anywhere for any purpose. Can you
>> point me to the place in the standard where this is stated?
>
> Several poster have cliamed that, most recently Arcane Jill. Check with
> them. There is supposed to be a difference between a UTF-8 encoding not
> requiring a BOM and a UTF-8 process requiring it.

I am not a Unicode expert here. Like you, I'm a programmer. I mostly lurk here.
I've just lurked here long enough to pick up a few things, but folk like ...
well folks who actually get their names into Unicode documents ... /those/ guys
are the real experts, the people to whom to listen. But anyway, from TUS,
Chapter 15:

"Systems that use the byte order mark must recognize when an initial U+FEFF
signals the
byte order. In those cases, it is not part of the textual content and should be
removed before
processing, because otherwise it may be mistaken for a legitimate zero width
no-break space."

Of course, Chapter 15 (Special Areas and Format Characters) is not Chapter 3
(Conformance), so maybe it's not a conformance requirement as such? I don't
know enough to know the difference.

Jill

Next message: Antoine Leca: "Re: 32'nd bit & UTF-8"
Previous message: Kenneth Whistler: "Re: Subject: Re: 32'nd bit & UTF-8"
Maybe in reply to: Hans Aberg: "32'nd bit & UTF-8"
Next in thread: Clark Cox: "Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Reply: Clark Cox: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 21 2005 - 02:45:13 CST