RE: No Invisible Character - NBSP at the start of a word

From: Jony Rosenne (
Date: Tue Nov 30 2004 - 00:36:30 CST

  • Next message: Allen Haaheim: "RE: Radicals and Ideographs"

    > -----Original Message-----
    > From:
    > [] On Behalf Of Peter Constable
    > Sent: Tuesday, November 30, 2004 1:20 AM
    > To: Unicode Mailing List
    > Subject: RE: No Invisible Character - NBSP at the start of a word
    > > From: []
    > On Behalf
    > > Of Jony Rosenne

    > Jony, where you and I have had a different worldview is that, it seems
    > to me, you view characters as encoding language, and I view characters
    > as encoding letterforms; or, put another way, for you, text is
    > necessarily linguistic, whereas for me text is text, independent of
    > linguistic interpretation. To make this concrete, the fact that a qere
    > sequence involves the vowel points of word A rather than word B is
    > linguistically interesting, but irrelevant as far as encoding is
    > concerned. If the displayed letterforms consist of a lamed with two
    > vowel points, then the encoded character sequence IMO should be lamed
    > with two vowel points -- and I would not consider that a hack.

    When I look at the text, even with a magnifying glass, I do not see a Lamed
    with two points. The displayed form, from my point of view, is a Lamed with
    a single point and another point without a base character. The Hiriq is not
    under the Lamed, it is between the Lamed and the Mem. The linguistic
    approach is just the explanation, the displayed letterforms are quite clear.

    Even when I look at old Latin manuscripts, which I did once again when I
    visited the flea market in Milan a few months ago, they are not plain text
    and they cannot be faithfully reproduced in Unicode without markup. Although
    the nature of Hebrew manuscripts is different, I do not understand the
    desire to make Hebrew different, and I cannot accept it if it makes the
    computerized handling of Hebrew unnecessarily more complicated that it is

    To make it very clear: The use of CGJ approved by the UTC is fine by me, and
    I have no objection to anyone using it, but it is not required for Hebrew,
    and we do not have a standard plain text solution for Qere and Ketiv and for
    Yerushala(y)im. Regarding the latter, the UTC discussion was based on a
    mistaken or incomplete presentation of the problem. Yes, for those need two
    vowels for a single letter, CGJ would do it, but since this is not my
    question, CGJ is not the answer. The hack needed here is an invisible base

    If anyone wants to use CGJ or any other Unicode characters that are not
    included in the standard Hebrew subset (Unicode does not define subsets, but
    other bodies do and implementers necessarily have to) to encode Hebrew
    texts, they should do their users a favor and explain to them that they
    require specific implementations, operating systems and fonts.



    > Peter Constable

    This archive was generated by hypermail 2.1.5 : Tue Nov 30 2004 - 00:36:54 CST