Re: No Invisible Character - NBSP at the start of a word

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Nov 24 2004 - 19:27:54 CST

  • Next message: Peter Kirk: "Re: No Invisible Character - NBSP at the start of a word"

    At 04:53 PM 11/24/2004, Peter Kirk wrote:
    >On 24/11/2004 22:23, Peter Kirk wrote:
    >
    >>On 24/11/2004 22:00, Asmus Freytag wrote:
    >>
    >>>...
    >>>The sequence SPACE NBSP *does* not allow a break after the SPACE under
    >>>the line breaking rules we publish in UAX#14.

    I tried to change does not into *does* and missed deleting the word 'not'.

    >>>The common usage in HTML, is to use one or more NBSP followed by SPACE
    >>>to mark a wider space, that allows a break at the end. NBSPs are not
    >>>coalesced with other spaces.
    >>>
    >>>>In the Hebrew case, it is probably necessary to precede the NBSP with
    >>>>RLM to ensure that the NBSP and combining mark are taken with the rest
    >>>>of the word as right-to-left. Does this inserted RLM affect the
    >>>>situation with HTML, XML etc?
    >>>
    >>>
    >>>
    >>>You are always free to surround the NBSP with other format characters,
    >>>such as RLM or ZWSP, to tailor whatever behavior those format characters
    >>>affect.
    >>Thank you.
    >>
    >>What if I used the sequence <RLM, NBSP, combining mark>? Would I then get
    >>a break opportunity before the RLM, if it is preceded by SPACE?
    >>Presumably LRM could be used similarly if the same situation occurs in a
    >>left-to-right language.
    >I note that there is a relevant change being proposed to UAX #29 (public
    >review issue #51), in that NBSP is now to be treated as a letter for
    >determination of word and sentence boundaries. This certainly helps with
    >the use of NBSP as a carrier for spacing diacritics, as e.g. in Hebrew.
    >
    >Also the following clarification is being proposed for UAX #16 on line
    >breaking (public review issue #56):

    UTR#16 is UTF-EBCDIC, you must mean UAX#14.

    >>The preferred base character for showing combining marks in isolation is
    >>U+00A0 No-Break SPACE. If a line break before or after the combining
    >>sequence is desired, U+200B ZERO WIDTH SPACE can be used. The use of
    >>U+0020 SPACE as a base character is deprecated.
    >
    >But this draft also states:
    >
    >>when NBSP follows SPACE, there is a break opportunity after the SPACE and
    >>NBSP will go as visible space onto the next line.
    >
    >This is different from what Asmus stated above: "The sequence SPACE NBSP
    >*does* not allow a break

    my editing mistake in composing my message to you. If you check the first
    sentence of http://www.unicode.org/report/tr14-16.html#GL you will see why
    it's *does* allow the break.

    >after the SPACE". So is this actually a proposed change to the line
    >breaking rules? If so, it is one I support.
    >
    >--
    >Peter Kirk
    >peter@qaya.org (personal)
    >peterkirk@qaya.org (work)
    >http://www.qaya.org/
    >



    This archive was generated by hypermail 2.1.5 : Wed Nov 24 2004 - 19:34:20 CST