Re: Is it save to dig into comment contents of PropList.txt?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 7 Nov 2013 17:12:22 +0100

So I've been wong for POSIX, but using MS compilers, I did not figure out th
at it violated the POSIX specs for this point.

How can however the C/C++ standards adapt to the situation? After all,
POSIX is old, no longer maintained (it is violated on many systems), not
absolutely a standard for C and C++ themselves.

There are at least several profiles for "POSIX" locales:
- DOS, Windows and OS/2 as defined by Microsoft (and IBM), but aldo in
compilers for these systems by other brands
- Unix and Linux (where gcc has been ported)
- VMS has its own specificities, ...

IBM is a bit smarter because it allows selecting the compatibility layers
for emulating various OS'es (including IBM versions of Unix, or other OS'es
made and maintained by IBM, so that their applications can be ported to
Windows; IBM versions of Unix also provide locales emulating DOS and
Windows).

I can't remember the tricky details about how to select them by a locale
ID, or with some other environment variables at compile time or run time.
Many programs in fact cn't rely only on POSIX locales and provide their own
compatibility layer based on detection of the target OS on which the
program will run.

In other words, these profiles are dependant in fact of the OS families.
Even libraries for gcc on Windows use the MS definitions of Windows
locales, but interact correctly with other Windows programs (but in fact
programs compiled on Windows, even with gcc, never run in the POSIX locale
implemented byt Microsoft only as an option, not installed by default, and
in practice not maintained in the old package for the "Unix comptibility
POSIX subsystem" for NT).

I do think that these old unmaintained POSIX properties should effectively
be replaced to use better properties based on the Unicode standard (leave
POSIX in the limbs now, it has never been portable), and that the C/C++ st
andards should evolve to use Unicode properties, rather than POSIX
properties (**except** on systems running in **their** own locales localy c
alled "POSIX", with their specificities).

2013/11/6 Karl Williamson <public_at_khwilliamson.com>

> On 11/06/2013 03:43 AM, Steffen Daode Nurpmeso wrote:
>
>> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>> |2013/11/5 Steffen Daode <sdaoden_at_gmail.com>
>> |> (The problem i'm facing is that _PRINT and _GRAPH cannot be set
>> |> for some properties from PropList.txt, say, _PRINT can't be set
>> |> for U+0009, CHARACTER TABULATION (ht), since it's a Cc, but in
>> |
>> |TAB is "printable" (for the isprint() macro in standard C librries)
>> because
>> |it has a whitespace property, even if its general category is very
>> weakly
>>
>> Nope according to POSIX, Vol. 1: Base Definitions, 7.3.1. LC_CTYPE ([1]):
>>
>> print
>> Define characters to be classified as printable characters,
>> including the <space>.
>>
>> In the POSIX locale, all characters in class graph shall be
>> included; no characters in class cntrl shall be included.
>>
>> In a locale definition file, characters specified for the
>> keywords upper, lower, alpha, digit, xdigit, punct, graph, and
>> the <space> are automatically included in this class. No
>> character specified for the keyword cntrl shall be specified.
>>
>> [1] <http://pubs.opengroup.org/onlinepubs/9699919799/
>> basedefs/V1_chap07.html#tag_07_03_01>
>>
>> Verifieable under LC_ALL=en_GB.UTF-8 in Mac OS X Snow Leopard
>> (which admittedly uses very old Citrus data, i always wonder why all
>> those Gigabytes of «Software Update»s don't tweak that, not to
>> talk about GNU make 3.81 and all the other buggy or non-compliant
>> stuff, but that is a different story):
>>
>> #include <stdio.h>
>> #include <ctype.h>
>> #include <wctype.h>
>> int main(void) {
>> printf("%d %d\n",isprint('\t'), wcwidth(L'\t'));
>> return 0;
>> }
>>
>> ?0[steffen_at_sherwood tmp]$ cc -o zt t.c && ./zt
>> 0 -1
>>
>> |The character mapping for the isprint() macro is defined by an
>> expression
>> |based on existing Unicode properties. Most C libraries optimize this
>>
>> But i agree that POSIX has to move towards Unicode definitions,
>> and more byte- than bitwise.
>>
>> --steffen
>>
>>
> The only vendor I'm aware of that makes TAB a printable is Microsoft. Thus
> Philippe is wrong about this except for MS products.
>
> MS makes TAB also a control, violating the Posix standard by having it be
> both printable and a control. This is true in all locales I've seen under
> MS except the C locale. (MS also has other Posix violations, such as
> having isdigit() match superscript numbers.)
>
>
Received on Thu Nov 07 2013 - 10:15:26 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 07 2013 - 10:15:28 CST