Re: alpha, print, graph, blank, etc.

From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue Apr 22 2003 - 10:30:26 EDT

  • Next message: John H. Jenkins: "Re: *Complete* Big5 to Unicode mappings"

    That's an interesting approach, and legal according to posix (upper and
    lower can overlap).

    Any other comments on the other open issues (see chart for details)?

    1. xdigit: there is a narrow interpretation (0..0,A..F,a-f), or a broad
    interpretation (Nd + A..F, a..f,A..F, a..f) [normal & fullwidth]. We are
    leaning towards the broad interpretation, since it appears more consistent.

    2. cntl: add \p{gc=Zl} \p{gc=Zp}, the most control-like of the Cf? Add
    other Cf's?

    3. graph: exclude some/all Cfs?

    Mark
    (مرقص بن داود)
    ________
    mark.davis@jtcsv.com
    IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
    (408) 256-3148
    fax: (408) 256-0799

    ----- Original Message -----
    From: "Marco Cimarosti" <marco.cimarosti@essetre.it>
    To: "'Mark Davis'" <mark.davis@jtcsv.com>; <unicore@unicode.org>;
    <unicode@unicode.org>
    Sent: Tuesday, April 22, 2003 04:33
    Subject: RE: alpha, print, graph, blank, etc.

    > Mark Davis wrote:
    > > The POSIX/C-style property names (punct, alpha, lower, upper,
    > > digit, xdigit, alnum, cntrl, graph, print, space, blank) are
    > > not well specified, and don't really map well to the broader
    > > types of characters available in Unicode/10646. For example,
    > > there is no provision for titlecase, [...]
    >
    > My 0.2 euros: IMHO, title-case letters should be treated as *both*
    > upper-case and lower-case. I.e., my suggestion is that:
    >
    > - is[w]lower() returns TRUE for both lower-case and title-case
    > letters;
    > - is[w]upper() returns TRUE for both upper-case and title-case
    > letters;
    > - is[w]alpha() returns TRUE for any Unicode letter (general category
    > L*).
    >
    > For applications unaware of the existence if "title-case" letters, this
    > saves the basic semantics of is[w]alpha() (namely, "Is it a letter?"), and
    > one of the most basic semantics of is[w]lower() and is[w]upper() (namely,
    > "Can this character be converted to lower/upper-case?").
    >
    > For applications aware of the existence if "title-case" letters, the
    > is[w]upper(), is[w]lower(), and is[w]alpha() can be used in combination to
    > determine the exact "case type" of any letter:
    >
    > if (iswalpha(c))
    > {
    > if (iswupper(c) && iswlower(c))
    > {
    > printf("This is a title-case letter (Lt).\n", c);
    > }
    > else if (iswupper(c) && !iswlower(c))
    > {
    > printf("This is an upper-case letter (Lu).\n", c);
    > }
    > else if (!iswupper(c) && iswlower(c))
    > {
    > printf("This is a lower-case letter (Ll).\n", c);
    > }
    > else /* if (!iswupper(c) && !iswlower(c)) */
    > {
    > printf("This is letter with no case distinctions (Lo
    > or Lm).\n", c);
    > }
    > }
    > else
    > {
    > printf("This is not a letter.\n", c);
    > }
    >
    > Unfortunately, there is no corresponding trick to obtain a "to-title-case"
    > functionality, apart a non portable construct such as:
    >
    > c1 = towctrans(c2, wctrans("Title-case"));
    >
    > Anyway, converting to title case is something less fundamental than
    > upper/lower-casing, and it only makes sense at the string level.
    >
    > _ Marco
    >



    This archive was generated by hypermail 2.1.5 : Tue Apr 22 2003 - 11:18:34 EDT