Re: 32'nd bit & UTF-8

From: Hans Aberg (
Date: Tue Jan 18 2005 - 18:09:33 CST

  • Next message: Hans Aberg: "Re: 32'nd bit & UTF-8"

    On 2005/01/18 21:25, Jon Hanna at wrote:

    >> Under C/C++, one will use a wchar_t which is always of exactly 32-bit,
    >> regardless what internal word structure the CPU is using in
    >> its memory bus.
    > wchar_t can be 7bits in size or more than 128bits.

    Whatever it can be, modern platforms, such as GNU, have decided that it
    won't, but will be 32 bits. See

    >>> Not sure if I understand you correctly. What about 00 vs.
    >> C0.80, E0.80.80,
    >>> FE. etc.?
    >> I have added functions that admit creating regular
    >> expressions also for the
    >> overloaded UTF-BSS ("UTF-8") multibytes. This way, a lexer can provide
    > They aren't "overloaded", they are invalid.

    You probaly mean that the overloaded UTF-BSS (or whatever the correct name
    is) multibytes are illegal under UTF-8.

      Hans Aberg

    This archive was generated by hypermail 2.1.5 : Tue Jan 18 2005 - 18:13:34 CST