Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

From: Mark Davis (mark.davis@jtcsv.com)
Date: Tue Aug 05 2003 - 18:09:36 EDT

  • Next message: Michael Everson: "Re: Handwritten EURO sign"

    > << Zs, Zl, and Zp are considered format characters, but their
    > membership in the Z (separator) class takes precedence over their
    > membership in the Cf class, because the General Category assigns
    only
    > a single value to each character. >>

    Whenever you have a question about the status of a character, you need
    to look it up in the UCD. You can either do that by going through the
    unicode website, or if you want a more readable interface, use the ICU
    character browser, which formats that data.

    Look at space, U+0020.

    http://oss.software.ibm.com/cgi-bin/icu/ub/utf-8/?go=0020&ch.x=4&ch.y=7

    The general category is Space_Separator, *not* a format character.

    Now wording there could definitely be clearer, but the operant phrase
    is:

    > ...but their
    > membership in the Z (separator) class *takes precedence* over their
    > membership in the Cf class...

    So it would be cleared to say something like:

    In many ways the characters, Zs, Zl, and Zp, are similar to format
    characters, but because their general usage is significantly different
    they are broken out into a separate General Category, as Separator
    characters.

    Mark
    __________________________________
    http://www.macchiato.com
    ► “Eppur si muove” ◄

    ----- Original Message -----
    From: "Peter Kirk" <peter.r.kirk@ntlworld.com>
    To: "Mark Davis" <mark.davis@jtcsv.com>
    Cc: "Unicode List" <unicode@unicode.org>
    Sent: Tuesday, August 05, 2003 14:50
    Subject: Re: Display of Isolated Nonspacing Marks (was Re: Questions
    on ZWNBS...)

    > On 05/08/2003 14:40, Mark Davis wrote:
    >
    > >Where did you get the notion that space is not a base character?
    And
    > >base characters include those that are not control or format
    > >characters. Space is neither one.
    > >
    > >The standard specifically states in a number of places that to
    exhibit
    > >a combining mark in isolation you use a space (or NBSP).
    > >
    > >Mark
    > >__________________________________
    > >http://www.macchiato.com
    > >► “Eppur si muove” ◄
    > >
    > >
    > >
    > I got this from the Unicode Standard 4.0, as quoted by Jim Allan:
    >
    > > In http://www.unicode.org/book/preview/ch03.pdf the space
    characters
    > > in general are given class Zs:
    > >
    > > << Zs, Zl, and Zp are considered format characters, but their
    > > membership in the Z (separator) class takes precedence over their
    > > membership in the Cf class, because the General Category assigns
    only
    > > a single value to each character. >>
    > >
    > > So the various space characters (class Zs) are also classified as
    > > format characters.
    > >
    > > From http://www.unicode.org/book/ch04.pdf:
    > >
    > > << _D13 Base character:_ a character that does not graphically
    > > combine with preceding character, and that is neither control nor
    a
    > > format character. >>
    > >
    > > Accordingly, by definition, spaces are not base characters.
    >
    >
    >
    > --
    > Peter Kirk
    > peter.r.kirk@ntlworld.com
    > http://web.onetel.net.uk/~peterkirk/
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Aug 05 2003 - 18:45:28 EDT