RE: No Invisible Character - NBSP at the start of a word

From: Peter Constable (petercon@microsoft.com)
Date: Mon Nov 29 2004 - 17:20:02 CST

  • Next message: Peter Kirk: "Re: No Invisible Character - NBSP at the start of a word"

    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    On Behalf
    > Of Jony Rosenne

    > > But it *is* a
    > > piece of text, however
    > > malformed it might seem from normal lexicographic
    > > understanding. It may not be a word. It
    > > may, in fact, be two words merged into a unit. But it is most
    > > certainly text.
    >
    > Sure it is text, but it is not plain text.
    >
    > Qere and Ketiv are not malformed. I don't think anyone disagrees that
    they
    > are the juxtaposition of the letters of one word with the vowel points
    of
    > another.
    >
    > That most cases can be visibly reproduced by Unicode is a hack...

    Jony, where you and I have had a different worldview is that, it seems
    to me, you view characters as encoding language, and I view characters
    as encoding letterforms; or, put another way, for you, text is
    necessarily linguistic, whereas for me text is text, independent of
    linguistic interpretation. To make this concrete, the fact that a qere
    sequence involves the vowel points of word A rather than word B is
    linguistically interesting, but irrelevant as far as encoding is
    concerned. If the displayed letterforms consist of a lamed with two
    vowel points, then the encoded character sequence IMO should be lamed
    with two vowel points -- and I would not consider that a hack.

    > and is not a
    > sufficient justification to extend Unicode to support cases that
    cannot be
    > reproduced.
    >
    > There is the case of Yerushala(y)im, for which the plain text hack
    would
    > require an invisible RTL letter to represent the omitted Yod, or to
    allow
    > pointing an RLM. The CGJ hack may work too but it is based on a
    > misunderstanding, as if the Lamed has two vowels.

    The only hackish thing about needing CGJ is that the combining classes
    for vowel points that occupy the same space relative to a base should
    never have been different from one another, but since we cannot revise
    that detail, we need to come up with another mechanism to deal with it.
    I agree that using CGJ is a hack, but not because the text involves one
    base letterform with two combining vowel points.

    > > But I'm now, as always, happy to hear alternate suggestions
    > > as to how things might be
    > > handled in either encoding or display. So if you think merged
    > > Ketiv/Qere forms should be
    > > handled by markup, perhaps you can explain how, so that I
    > > might better understand. Thank you.
    >
    > This is the Unicode list, not the markup - SGML etc. list. And I do
    not know
    > too much about markup.

    It's not a list dedicated to discussion of markup, but if people contend
    that a solution to a problem lies in something other than plain text,
    then it is germane to this list to have that alternative solution
    elaborated.

    Peter Constable



    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2004 - 17:21:31 CST