Re: 32'nd bit & UTF-8

From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 14:46:54 CST

Next message: Peter Constable: "RE: 32'nd bit & UTF-8"

Previous message: Hans Aberg: "Re: UTF-8 'BOM'"
In reply to: Antoine Leca: "Re: 32'nd bit & UTF-8"
Next in thread: Antoine Leca: "Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Reply: Antoine Leca: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 2005/01/20 15:42, Antoine Leca at Antoine10646@Leca-Marti.org wrote:

>> There can not be more that one _standard_ library, i.e., a library
>> as part of the issued C++ ISO/ANSI standard. :-)
>
> Even with this restriction: C++ on one side builds on top of the C standard,

Actually, C++ is a wholly independent language, but with a C like syntax,
and the requirement that C++ code can be linked with C code. It is a complex
issue to figure out how these two hang together.

> so re-use the C notion of stream (stdio), which does have a wchar_t variant.
> And then on the other side we have the iostreams well-known in C++ folkore
> from Day 1, that I assume should also have a wchar_t facet.
> :-)

That already seems to have happened with GNU GCC, which fixes wchar_t to
32-bits.

>>> And I
>>> happen to know very well that the use of wchar_t streams (using
>>> the C meaning here, that is fwprintf etc.) is NOT widespreaded,
>>> for a lot of reasons.
>>
>> In the past it has been so. But GNU GCC has now settled for using
>> wchar_t for 32-bit type. So there ie probably where matters are
>> heading.
>
> You got me wrong. Perhaps it is the direction a particular implementation is
> heading. I am just saying USERS (programmers) are not there.

Those things are not widespread. But in the past, GNU has often proved be
leading on new features. So it may then come.

>>>> Portability does not mean that the program is expected to run
>>>> on different platforms without alterations, but merely tries
>>>> to lessen those needed changes.
>>>
>>> You are certainly free to define portability the way you want.
>>
>> This is how one define portability in the context of C/C++.
>
> If by "one" you mean yourself, we are in agreement.
> Now, if you mean the general meaning, definitively no.

It is quote from BS (principal designer of C++) somewhere, I think, but I do
not remember where. Perhaps it is in his "DEC++". Check it out in the C/C++
standards newsgroups.

>> But when using a C/C++ compiler this is not so:
>
> Yes it is. The first step of the formal model of a C/C++ compiler (according
> to both ISO standards) is to map the physical source characters into an
> internal representation. So it is the job of the compiler vendor to actually
> ensure of the similarity when it comes to C/C++ sources.

The problem is that the underlying binary model differs from compiler to
compiler, and there is no easy way to know that from the point of the
language. People usually assume that there will be s specific type of
padding, and often it is, but may not be so. This is a topic for the C/CC+
newsgroups.

> This is exactly the same as relying on the HTTP server and client to pass
> the HTML stream from the producer (the guy that wrote the page) to the user
> (the browser).

The HTTP protocol guarantees that the binary data comes out the same over
the network. If you take a HTTP tool written in C, and compiles it on
different platforms, then it may not come out right because the C compilers
may use different underlying binary models.

> And the C/C++ paradigm is to use textual data when communicating (which is
> the framework targetted by Unicode).

But only within the framework of each single compiler. In fact, sometimes
even the different compilers on the same platform use different binary
models, at least in the past. Then special efforts are required when object
code form different compilers should be linked together. It is a pain, when
that happens, because the program just do not run properly, and one does not
know why.

> If you want more precise behaviour at
> binary level, you probably should consider at least Posix instead, or
> perhaps some ABI built on top of it.

Most POSIX software is written using C. So it does not help.

> And also restrict your low-level I/O to
> unsigned char, C (so C++) has definitive provisions to ensure what you want
> (or what you pretend to want) using them.

There is no guarantee that these will be 8-bit bytes.

And so on.

>> There was a guy, a few years ago, giving an example.
> ^^^^^^^^^^^^^^^
> My guess is that proper compiler's support for \u was missing then.

No, the support of \u... was appropriate according to the C++ standard
because the C++ standard did not require anything special for.

> However, it is not a good point against a feature.

The problem is not having such features, but that they are not sufficiently
specific when putting requirements on the underlying binary model. This then
causes problems when working with Unicode, unless the compiler writer has
decided to fill in Unicode friendly features in the lack of the standard
defining them.

Hans Aberg

Next message: Peter Constable: "RE: 32'nd bit & UTF-8"
Previous message: Hans Aberg: "Re: UTF-8 'BOM'"
In reply to: Antoine Leca: "Re: 32'nd bit & UTF-8"
Next in thread: Antoine Leca: "Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Reply: Antoine Leca: "Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 14:57:35 CST