RE: No Invisible Character - NBSP at the start of a word

From: Jony Rosenne (
Date: Mon Nov 29 2004 - 13:06:26 CST

  • Next message: Flarn: "Ideograph?!?"

    > -----Original Message-----
    > From:
    > [] On Behalf Of John Hudson
    > Sent: Sunday, November 28, 2004 2:55 AM
    > To: 'Unicode Mailing List'
    > Subject: Re: No Invisible Character - NBSP at the start of a word
    > Jony Rosenne wrote:
    > >>Jony, what do you think plain text is? Why should the
    > >>arrangement of text on a page as a
    > >>marginal note be considered any differently from text
    > >>anywhere else *in its encoding*? Are
    > >>you suggesting that Unicode is only relevant to ... what?
    > >>totally unformatted text in a
    > >>text editor?
    > > Basically, yes. Except for the control codes in Unicode -
    > spaces, line feed,
    > > carriage return, etc.
    > > To indicate formatting one uses markup.
    > And markup is applied to what? Obviously, to text.


    > It seems to me that the primary purpose of the plain text
    > limitation in Unicode is to
    > maintain the character/glyph distinction, so that it is
    > clearly unnecessary to encode
    > display entities such as variant glyphs, ligatures, etc.
    > separately from the underlying
    > character codes that they visibly represent in various ways.

    I believe this is not the only purpose, but the purpose is not as important
    as is respecting the scope of Unicode.

    > On this basis, I think there
    > is a sound argument to be made against encoding an 'invisible
    > letter', if there is an
    > existing characters -- such as NBSP -- that logically and
    > effectively serves the same
    > purpose in encoding a particular piece of text. But it *is* a
    > piece of text, however
    > malformed it might seem from normal lexicographic
    > understanding. It may not be a word. It
    > may, in fact, be two words merged into a unit. But it is most
    > certainly text.

    Sure it is text, but it is not plain text.

    Qere and Ketiv are not malformed. I don't think anyone disagrees that they
    are the juxtaposition of the letters of one word with the vowel points of

    That most cases can be visibly reproduced by Unicode is a hack, and is not a
    sufficient justification to extend Unicode to support cases that cannot be

    There is the case of Yerushala(y)im, for which the plain text hack would
    require an invisible RTL letter to represent the omitted Yod, or to allow
    pointing an RLM. The CGJ hack may work too but it is based on a
    misunderstanding, as if the Lamed has two vowels.

    Also, these hacks foil searching and sorting, since neither the Qere or the
    Ketiv words will be handled correctly.

    > The idea that the position of such text on a page -- as a
    > marginal note -- somehow demotes
    > it from being text, is particularly nonsensical.

    Promotes, not demotes.

    > But I'm now, as always, happy to hear alternate suggestions
    > as to how things might be
    > handled in either encoding or display. So if you think merged
    > Ketiv/Qere forms should be
    > handled by markup, perhaps you can explain how, so that I
    > might better understand. Thank you.

    This is the Unicode list, not the markup - SGML etc. list. And I do not know
    too much about markup.


    > John Hudson
    > --
    > Tiro Typeworks
    > Vancouver, BC
    > Currently reading:
    > The Peasant of the Garonne, by Jacques Maritain
    > Art and faith, by Jacques Maritain & Jean Cocteau
    > Difficulites, by Ronald Knox & Arnold Lunn

    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2004 - 13:11:40 CST