Re: Questions on ZWNBS - for line initial holam plus alef

From: Kenneth Whistler (
Date: Sun Aug 10 2003 - 18:27:26 EDT

  • Next message: Philippe Verdy: "Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)"

    Peter Kirk said:

    > Tell Microsoft! (See Noah Levitt's posting.)


    > If this is indeed "The standard way to do what you want", then the
    > standard needs to make it clear that the sequence of <space, combining
    > mark> or <NBSP, combining mark> has the properties which I want, i.e. it
    > has the width of the combining mark alone, and not the full width of a
    > space,

    This is up to the implementation and the font, and is not something
    that the Unicode Standard should mandate, IMO. This steps over the
    bound of the plain text content.

    > and does not expand for justification,

    This is likewise an issue for the implementation. The Unicode Standard
    does not mandate how a typographic implementation must implement
    interword, intercharacter, or any other kind of justification.

    > is not a line breaking
    > opportunity,

    This, however, *is* specified. See UAX #14, in the section discussing
    CM (the line break class associated with combining marks):

    "If U+0020 SPACE is used as a base character, it is treated as
    AL instead of SP."

    What that means is that rather than sifting down through the line
    break rule determinations according to a lb=SP category, it is
    then handled as lb=AL, which puts it in the same class with
    ordinary letters for the purposes of determining a line break

    Of course, a conformant Unicode implementation is not *required*
    to implement line-breaking as specified in UAX #14. But if it
    claims it is doing so, and does not handle SP+combining_mark
    combinations this way, then it is a nonconformant implementation
    of line-breaking.

    > does not in fact have any of the properties of a space.

    It does, in fact, have some of the properties of a space, since
    it is U+0020 SPACE, after all. But the important fact is that
    implementations are supposed to be implementing the semantics
    of the combining character sequence taking the SPACE as the base
    and any following *non*-spacing combining mark as applied to
    that base. If the implementations then result in inappropriate
    rendering or line-breaking for that sequence, that is, as Kent
    said, an issue to take up with the implementers.

    > I
    > expect to see such a clarification in the next edition of the Unicode
    > Standard.

    See above for the reasons why it is unlikely to be any more
    constrained by the standard than it already is.

    A point I keep trying to make, but which often gets overlooked
    by people trying to code Unicode mechanisms for dealing with
    edge cases, is that the design goal of the Unicode Standard is,
    and always has been, to represent *plain text content*. It
    cannot, and should not, IMO, deal with requirements for
    representing arbitrarily fine distinctions of typographical
    detail in all manuscripts and other documents in all writing
    systems of the world.

    Continuing to require that the Unicode Standard *must* specify
    some inherent mechanism for indicating the display width of
    combining character sequences clearly steps over the bounds
    of what is required to represent plain text content.


    This archive was generated by hypermail 2.1.5 : Sun Aug 10 2003 - 18:57:23 EDT