Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Aug 05 2003 - 19:59:06 EDT

  • Next message: Kenneth Whistler: "Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)"

    Ted Hopp asked:

    > I believe that reasonable people might reasonably conclude from factoids 1
    > and 2 that SPACE is indeed a format character.
    >
    > Reasonable, but evidently wrong. Explanation, please?

    I provided the text deconstruction in my last email, but to
    continue, the confusion arises from the strange nature of
    SPACE in the history of character encoding.

    SPACE, for a long time now in the history of character encodings,
    has been classified as a *graphic* character. Certainly, in
    the general SC2 character encoding context of ISO 2022,
    SPACE always shows up in the G0 set, with other graphic
    characters, instead of in the various control functions
    encoded in C0 or C1 sets.

    But looked at from the legacy of device control, SPACE
    could just as well been categorized as a control function:
    MOVE PRINT HEAD ONE UNIT RIGHT, comparable to BACKSPACE.

    And in the context of the Unicode Standard, people often
    loosely talk about space characters as being format
    characters, since they are a) more akin to punctuation than
    normal letters, b) have no glyph associated with them,
    and c) impact line-breaking and other aspects of the formatting
    of characters in their vicinity.

    But the *formal* categorization of Unicode characters,
    defined by the UTC to help eliminate this kind of
    ambiguity in talk about the character types, is spelled
    out in Figure 2.5 of Unicode 4.0 now:

    http://www.unicode.org/book/preview/ch02.pdf

    and the *formal* meaning of "format control character"
    (Basic type = "Format") in Unicode is now any character
    with the General Category of {Cf, Zl, Zp}.

    The space characters are all lumped in with graphic characters.

    So while there are still some ambiguities to be worked out
    in the definition of "base character" in the Unicode Standard,
    neither the status of SPACE as a graphic character nor the
    recommendation of the standard that non-spacing marks be
    applied to SPACE as a means of showing them in isolation
    is in question.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Aug 05 2003 - 20:59:28 EDT