Re: No Invisible Character - NBSP at the start of a word

From: Peter Kirk (
Date: Fri Nov 26 2004 - 16:37:12 CST

  • Next message: Peter Kirk: "Re: Relationship between Unicode and 10646"

    On 26/11/2004 21:27, Doug Ewell wrote:

    > ...
    >One useful litmus (or lackmus) test for this Hebrew example would be
    >whether the text in question is still legible, with its original
    >meaning, when reduced to plain text representable in today's Unicode.
    >If the special Ketiv/Qere handling is needed only because It Is The
    >Word, and This Is How It Was Written, then this is probably a
    >paleographic distinction and out of scope for plain text. If it
    >genuinely changes the spelling, that is another matter.
    Well, for a start we need to define what might be meant by "reduced to
    plain text". In this case there is simply no logical way to describe
    what is written as plain text plus markup. I suppose some kind of markup
    like <ketiv>KKKK</ketiv><qere>QqQqQ</qere> could be used (K = Ketiv base
    character, Q = Qere base character, q = Qere diacritical mark), and this
    would preserve the original meaning, but it would not show how the
    individual Ketiv base characters and Qere combining marks are
    graphically combined, i.e. it would not distinguish the written
    "blended" forms KqKqKK and KqKKqK, which are graphically distinct. And
    certainly if the markup were simply stripped from this the resulting
    form KKKKQqQqQ would not be legible.

    But fortunately this whole issue is a storm in a teacup. For Unicode
    does provide quite adequate ways of representing every known Ketiv and
    Qere blended form - since we sorted out the Yerushala(y)im issue more
    than a year ago. The only real problem comes when the Qere is longer
    than the Ketiv and the blended form looks something like qKqKqKq, so
    starting with a combining mark. It is well established that such a
    combining mark with a blank base character may be represented by NBSP
    followed by the combining mark (and the alternative with SPACE is now
    apparently deprecated). And it seems that the UTC in rejecting the
    INVISIBLE LETTER proposal, and in proposing instead certain changes to
    the properties of NBSP which are currently out for public review, has
    reaffirmed this usage.

    So I only raised this issue to clarify exactly how NBSP should be used
    in such cases. Although I have been rather confused by the responses I
    have received, I think the situation is clear as follows: NBSP may be
    used with a combining mark at the start of a word, but should be
    preceded by ZWSP to ensure a break opportunity before the word (although
    this should become unnecessary if the proposed revision to UTR #14 is
    accepted) and also by RLM to ensure correct bidi behaviour.

    Please let me know if any of you disagree with this conclusion.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Fri Nov 26 2004 - 18:59:34 CST