Re: Printing and Displaying Dependent Vowels

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Mar 30 2004 - 11:15:05 EST

  • Next message: Philippe Verdy: "Re: Windows and Mac character encoding questions"

    At 04:28 PM 3/29/2004, Kenneth Whistler wrote:
    > > I will say again as I have said before - but the above (and what I
    > > snipped) is extra evidence for it - that what is broke ... is
    > > the rule that the isolated (generally spacing) form of a combining mark
    > > should be formed by SPACE or NBSP followed by the combining mark.
    >
    >This has been the *intent* of the standard since its inception in
    >1989.
    >
    > > There
    > > are many good reasons for not using SPACE for this, including default
    > > behaviour like inserting line breaks immediately after SPACE.
    >
    >Nope. UAX #14 specifies the following regarding SPACE followed by
    >combining marks:
    >
    >"If U+0020 SPACE is used as a base character, it is treated as AL
    >instead of SP."

    This is an unfortunate typo in UAX#14. The correct statement is:

    "If U+0020 SPACE is used as a base character, it is treated as ID
    instead of SP."

    see the description of these issues in the rules section of the UAX
    which are quite explicit:
    LB 7a In all of the following rules, if a space is the base character for
    a combining mark, the space is changed to type
    <http://www.unicode.org/reports/tr14/#ID>ID. In other words, break before
    <http://www.unicode.org/reports/tr14/#SP>SP
    <http://www.unicode.org/reports/tr14/#CM>CM* in the same cases as one would
    break before an <http://www.unicode.org/reports/tr14/#ID>ID.

    Treat SP CM* as if it were ID

    As stated in [<http://www.unicode.org/reports/tr14/#Unicode>Unicode],
    Section 7.7 Combining Marks, combining characters are shown in isolation by
    applying them to either U+0020 SPACE (SP) or U+00A0 NO- BREAK SPACE (NBSP).
    The visual appearance is the same, but the line breaking result is
    different. Correspondingly, if there is no base, or if the base character
    is <http://www.unicode.org/reports/tr14/#SP>SP,
    <http://www.unicode.org/reports/tr14/#CM>CM* or
    <http://www.unicode.org/reports/tr14/#SP>SP
    <http://www.unicode.org/reports/tr14/#CM>CM* behave like
    <http://www.unicode.org/reports/tr14/#ID>ID.

    >This means that a combining character sequence of this type is treated
    >as a unit for the purposes of line breaking, and this overrides the
    >behavior otherwise of SPACE to be treated as a line break
    >opportunity.

    There's never a line break opportunity between a SPACE and a combining
    mark, but
    since SP is treated like an ID (ideopgrahic line breaking class), there are
    break opportunities *before* the SP that will not be there if an NBSP is used.

    >Of course UAX #14 only spells out default behavior,
    >but then "default behaviour" is what was claimed just above.
    >
    > > Using NBSP rather than SPACE has several advantages, and has long been
    > > specified in Unicode, although not widely implemented. It is less likely
    > > to occur accidentally. But it has disadvantages, especially that it will
    > > always be a spacing character, whereas for display of isolated Indic
    > > vowels no extra spacing is required.
    >
    >NBSP is not a fixed-width space.

    Correct. Somewhere in the standard, we should point out that using a
    space/NBSP as base character does not require these spaces to be at the
    same widths as elsewhere in the text, but that they can (and should) be
    adjusted to best serve this 'base character' function.

    A./



    This archive was generated by hypermail 2.1.5 : Tue Mar 30 2004 - 12:07:22 EST