RE: No Invisible Character - NBSP at the start of a word

From: Jony Rosenne (
Date: Tue Dec 07 2004 - 01:52:55 CST

  • Next message: Richard Cook: "Re: Unicode for words?"

    In chapter 8, regarding Hebrew, the standard says:

    Positioning. Marks may combine with vowels and other points, and there are
    complex typographic rules for positioning these combinations.

    I understand that this sentence should be regarded as being normative.

    Clause 4.3 uses the word "tend".

    Chapter 5 is a "guideline".

    Clause 5.13 describes a default behavior and says "This default behavior may
    be altered based on typographic preferences or on knowledge of the specific
    orthographic treatment to be given to multiple nonspacing marks in the
    context of a particular writing system."

    In view of all of these, I believe our members and implementers have
    misinterpreted the standard as it applies to the rendering of Hebrew
    combining marks, and the traditional typographic conventions of Hebrew
    should be applied rather than the Unicode default based on combining
    classes. This default was intended, as far as I remember, as it applies to
    Hebrew, to provide a general indication and not to replace the first
    quotation above.

    Consequently, there is and cannot be anything wrong with Unicode (at least
    in this respect) and it does support "ANY sequence of Hebrew vowels and

    I do maintain that is some cases the typographic process would require out
    of band assistance in determining the precise presentation desired, and that
    this falls outside the scope of plain text and Unicode.


    > -----Original Message-----
    > From:
    > [] On Behalf Of Dean Snyder
    > Sent: Monday, December 06, 2004 6:03 AM
    > To: Unicode List
    > Subject: Re: No Invisible Character - NBSP at the start of a word
    > Mark E. Shoulson wrote at 7:20 PM on Saturday, December 4, 2004:
    > >I would say that pointing
    > >one text with the vowels of another, without regard for
    > discrepencies in
    > >character-count, constitutes an abuse of the Hebrew orthography, and
    > >shouldn't be considered "normal" usage that must be supported.
    > Calling ketiv/qere spellings orthographic abuse, abnormal,
    > and not worthy
    > of support in Unicode is based on reasoning backwards from the faulty
    > Unicode model for encoded Hebrew, rather than forwards from the Hebrew
    > script to an encoding model.
    > >From an encoding point of view, ketiv/qere is NOTHING MORE
    > than arbitrary
    > sequences of Hebrew vowels and consonants, and just as
    > Unicode supports
    > ANY sequence of Latin vowels and consonants it should have,
    > from the very
    > beginning, supported ANY sequence of Hebrew vowels and consonants. The
    > problem lies not in the script, the problem lies in the inadequate
    > encoding model adopted for it - and it needs to be fixed. ALL of the
    > Hebrew script must be supported; anything less is simply unacceptable.
    > As I said similarly elsewhere, this must be supported in plan tixt -
    > ketiv = "plain text", qere = "all scripts". As I have just
    > demonstrated
    > this is trivial in Latin; it should also be trivial in Hebrew.
    > Respectfully,
    > Dean A. Snyder
    > Assistant Research Scholar
    > Manager, Digital Hammurabi Project
    > Computer Science Department
    > Whiting School of Engineering
    > 218C New Engineering Building
    > 3400 North Charles Street
    > Johns Hopkins University
    > Baltimore, Maryland, USA 21218
    > office: 410 516-6850
    > cell: 717 817-4897

    This archive was generated by hypermail 2.1.5 : Tue Dec 07 2004 - 01:54:29 CST