Re: 32'nd bit & UTF-8

From: Antoine Leca (
Date: Wed Jan 19 2005 - 08:33:06 CST

  • Next message: Tim Greenwood: "Article on dying languagese"

    On Tuesday, January 18th, 2005 18:52Z Hans Aberg va escriure:

    I do not believe there is much point comment on. I cancelled most of the
    comments I did, because it turned to an useless rant to what I can consider
    after re-reading as a troll.
    I would just correct plain mistakes.

    > Under C/C++ can actually use, apart from byte streams, other
    > streams such as wchar_t.

    This could miss C/C++ objectives of portability. Please re-read TUS 5.2
    about this.

    > Under C/C++, one will use a wchar_t which is always of exactly
    > 32-bit,


    > regardless what internal word structure the CPU is using in
    > its memory bus.

    Worse. An ABI that requires an opaque type to be of a determinate shape
    whatever the underlying structure, is missing completely the point.
    Fortunately Posix does not do that.

    > Moreover, the latest edition of C, C99, has types that the
    > compiler can support where the sizes of the integral types are
    > indicated.

    Yes. But their existence is not mandatory (at least the fixed-width one that
    I believe you are alluding to). Depending on them makes your program less

    You would have a better luck with the proposed char32_t (TR 19769), which is
    intended for this use. But then you would discover that it perfectly can be
    36 or 64 bits in length.
    Then you could begin to understand the point: the size of the underlying
    type is irrelevant. The fact it happens to be 32 on your box in this year
    2005 is just one aspect of the problem. What is important is to support the
    range from 0 to 0x10FFFF (when it comes to Unicode). 32 bits are good for
    that, and they are widespread, so it was a choice for some ABI to select
    this. But the domain of the type is to be restricted to 0 to 0x10FFFF,
    nothing else. And there is no point trying to enlarge this domain, at least
    until you are dealing with Unicode that is characters.


    This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 08:34:07 CST