Re: No Invisible Character - NBSP at the start of a word

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Nov 24 2004 - 16:00:58 CST

  • Next message: Asmus Freytag: "Re: My Querry"

    At 04:36 AM 11/24/2004, Peter Kirk wrote:
    >I understand that the proposed INVISIBLE CHARACTER was rejected at the
    >recent UTC meeting. I presume that the intention is that NBSP should be
    >used instead.

    At the moment, NBSP is the only sanctioned base character without 'ink'.

    >There are cases of words which start with spacing combining marks, for
    >which there are no separate Unicode characters. For example, there are
    >some unusual biblical Hebrew word forms (Ketiv consonants with Qere
    >vowels, the forms printed in Hebrew Bibles) which start with spacing
    >combining marks. For some examples (in fact this is intended to be an
    >exhaustive list of such words), see
    >http://www.qaya.org/academic/hebrew/Ketiv-Qere-difficult.pdf, the "blended
    >forms" column of rows with the note "point before word".
    >
    >This UTC decision leaves is in a situation in which such words need to be
    >represented in Unicode with NBSP and combining marks at the start of a
    >word. Does this lead to problems with HTML, XML etc? Are there cases in
    >which this word initial NBSP will be combined with a preceding word space,
    >and so the intended word spacing and break opportunity (before the NBSP)
    >may be lost?

    The sequence SPACE NBSP *does* not allow a break after the SPACE under the
    line breaking rules we publish in UAX#14.

    The common usage in HTML, is to use one or more NBSP followed by SPACE to
    mark a wider space, that allows a break at the end. NBSPs are not coalesced
    with other spaces.

    >In the Hebrew case, it is probably necessary to precede the NBSP with RLM
    >to ensure that the NBSP and combining mark are taken with the rest of the
    >word as right-to-left. Does this inserted RLM affect the situation with
    >HTML, XML etc?

    You are always free to surround the NBSP with other format characters, such
    as RLM or ZWSP, to tailor whatever behavior those format characters affect.

    A./



    This archive was generated by hypermail 2.1.5 : Wed Nov 24 2004 - 16:02:40 CST