Re: No Invisible Character - NBSP at the start of a word

From: Peter Kirk (
Date: Mon Nov 29 2004 - 17:11:52 CST

  • Next message: Asmus Freytag: "Re: Ideograph?!?"

    On 29/11/2004 19:06, Jony Rosenne wrote:

    > ...
    >Qere and Ketiv are not malformed. I don't think anyone disagrees that they
    >are the juxtaposition of the letters of one word with the vowel points of
    >That most cases can be visibly reproduced by Unicode is a hack, and is not a
    >sufficient justification to extend Unicode to support cases that cannot be

    I don't think there are in fact any cases which cannot be reproduced,
    since NBSP may be used to carry combining marks, and the CGJ mechanism
    has been approved by the UTC. So this discussion is rather pointless. If
    anyone knows of any cases which cannot be represented properly by
    current Unicode, please let us know, and then perhaps we can reopen the

    >There is the case of Yerushala(y)im, for which the plain text hack would
    >require an invisible RTL letter to represent the omitted Yod, or to allow
    >pointing an RLM. The CGJ hack may work too but it is based on a
    >misunderstanding, as if the Lamed has two vowels.
    Unicode represents text as written or printed, not pronunciation. Sure,
    Yerushala(y)im is pronounced with a yod which is not written. But this
    letter is not part of the written word form, not part of the spelling.
    It is like many cases in many languages where a letter is pronounced but
    not written. Irrespective of the pronunciation, the Lamed is *written*
    with two vowels. And so Unicode correctly encodes it with two vowels,
    and inserts the CGJ to prevent inappropriate reordering.

    >Also, these hacks foil searching and sorting, since neither the Qere or the
    >Ketiv words will be handled correctly.
    True, it is not possible to search the text and sort on the Qere form,
    for the simple reason that this is not part of the plain text; its
    consonants appear only in the margin. The Qere form can be added to the
    text with markup that it should be invisible, and in this way it can be
    searched and sorted.

    It is possible to search and sort on the Ketiv form as this is
    unpointed, by setting the search or sort to use the base characters
    only. But this might require tailoring of collation to ignore NBSP.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2004 - 17:43:18 CST