Re: No Invisible Character - NBSP at the start of a word

From: Asmus Freytag (
Date: Wed Nov 24 2004 - 16:00:58 CST

  • Next message: Asmus Freytag: "Re: My Querry"

    At 04:36 AM 11/24/2004, Peter Kirk wrote:
    >I understand that the proposed INVISIBLE CHARACTER was rejected at the
    >recent UTC meeting. I presume that the intention is that NBSP should be
    >used instead.

    At the moment, NBSP is the only sanctioned base character without 'ink'.

    >There are cases of words which start with spacing combining marks, for
    >which there are no separate Unicode characters. For example, there are
    >some unusual biblical Hebrew word forms (Ketiv consonants with Qere
    >vowels, the forms printed in Hebrew Bibles) which start with spacing
    >combining marks. For some examples (in fact this is intended to be an
    >exhaustive list of such words), see
    >, the "blended
    >forms" column of rows with the note "point before word".
    >This UTC decision leaves is in a situation in which such words need to be
    >represented in Unicode with NBSP and combining marks at the start of a
    >word. Does this lead to problems with HTML, XML etc? Are there cases in
    >which this word initial NBSP will be combined with a preceding word space,
    >and so the intended word spacing and break opportunity (before the NBSP)
    >may be lost?

    The sequence SPACE NBSP *does* not allow a break after the SPACE under the
    line breaking rules we publish in UAX#14.

    The common usage in HTML, is to use one or more NBSP followed by SPACE to
    mark a wider space, that allows a break at the end. NBSPs are not coalesced
    with other spaces.

    >In the Hebrew case, it is probably necessary to precede the NBSP with RLM
    >to ensure that the NBSP and combining mark are taken with the rest of the
    >word as right-to-left. Does this inserted RLM affect the situation with
    >HTML, XML etc?

    You are always free to surround the NBSP with other format characters, such
    as RLM or ZWSP, to tailor whatever behavior those format characters affect.


    This archive was generated by hypermail 2.1.5 : Wed Nov 24 2004 - 16:02:40 CST