From: Karljürgen Feuerherm (cuneiform@rogers.com)
Date: Sat Jul 26 2003 - 09:13:58 EDT
I believe this to the wrong outlook.
The real situation is that the real text has no Yod--deliberately so from a
Masoretic standpoint. No invisible Yod should be inserted to 'emend' the
text.
(Note that I am not making a pietistic argument, I'm not the least bit
pietistic, though I suspect there are Biblical scholars who would take that
view. I'm simply making a text faithfulness argument.)
K
----- Original Message -----
From: "Jony Rosenne" <rosennej@qsm.co.il>
To: <unicode@unicode.org>
Sent: Saturday, July 26, 2003 2:24 AM
Subject: RE: Yerushala(y)im - or Biblical Hebrew
> This explanation makes me unhappy with CGJ.
>
> Ken says: "The important things are that it is a) invisible, b) a
combining
> mark, and c) has combining class zero".
>
> And: "There is no need for an invisible base character here".
>
> On the contrary, to represent the text we do need an invisible base
> character for the Hiriq, representing the unwritten Yod.
>
> Another possibility is to encode the Yod with a complex text (in the
meaning
> non plain text) control saying the Yod is invisible.
>
> I think it is important, whatever solution is chosen, to represent the
real
> situation, rather than just a sequence of codes that happens to be able to
> produce the desired visual output.
>
> Jony
>
> > -----Original Message-----
> > From: unicode-bounce@unicode.org
> > [mailto:unicode-bounce@unicode.org] On Behalf Of Kenneth Whistler
> > Sent: Saturday, July 26, 2003 2:40 AM
> > To: ted@newslate.com
> > Cc: unicode@unicode.org; kenw@sybase.com
> > Subject: Re: Yerushala(y)im - or Biblical Hebrew
> >
> >
> > Ted continued:
> >
> > > If I recall correctly, the suggestion for using CGJ for
> > yerushala(y)im
> > > was to encode it as: <...lamed, patah, cgj, hiriq, final
> > mem>. Also, I
> > > seem to recall that this gave some people heartburn because CGJ was
> > > not intended to join two combining characters. What if this
> > case were
> > > encoded as: <...lamed, patah, cgj, zwnbs, hiriq, final
> > mem>? (Please
> > > forgive me if this is what had been proposed all along.)
> > >
> > > As I understand it from reading the description of CGJ (and
> > ignoring
> > > for the moment that zwnbs has no visible glyph and is
> > general category
> > > Cf), this is exactly what CGJ was designed for: treat the two base
> > > characters on either side of the CGJ as a single grapheme for the
> > > purpose of placing combining characters. This approach uses
> > zero width
> > > no-break space to represent the "missing letter"
> > interpretation of the
> > > two vowels pointed out by Jony Rosenne. Normalization
> > wouldn't destroy
> > > the ordering of the vowels, and Hebrew-aware software could
> > be written
> > > to do all this more-or-less transparently and automatically.
> >
> > Hmm. Some further clarifications are in order, since the
> > documentation for both of these characters has not quite
> > caught up to the UTC decisions regarding them. A lot of work
> > went into the Unicode 4.0 documentation on these, and the
> > Unicode 4.0 chapters will be posted online very soon -- at
> > which point it would be helpful if everyone concerned about
> > this issue takes the time to read the latest on these
> > characters in particular.
> >
> > First, about ZWNBS (U+FEFF). Because of the confusing overlap
> > of functionality of U+FEFF as the BOM (byte order mark) in
> > the Unicode encoding schemes and as what its name, ZERO WIDTH
> > NO-BREAK SPACE implies, the UTC (as of Unicode 3.2)
> > standardized a separate character, U+2060 WORD JOINER. That
> > character is described in UAX #14, Line Breaking Properties:
> > http://www.unicode.org/reports/tr14/
> > U+2060 is "the preferred choice for an invisible character to keep
> > other characters together that would otherwise be split
> > across the line at a direct break." U+FEFF retains that
> > semantic, for backwards compatibility, but its preferred use
> > is as the byte order mark only.
> >
> > So whether or not a line break format control character is
> > relevant to the Biblical Hebrew vowel problem (and I don't
> > think it is, actually), one should be talking about use of
> > U+2060 WORD JOINER (WJ), rather than U+FEFF ZWNBS in any such
> > new context.
> >
> > Second, there is U+034F COMBINING GRAPHEME JOINER (CGJ)
> > itself. The impetus for encoding the CGJ at all was to have a
> > plain text means of distinguishing, for example, an "ie"
> > sequence that weights as two units for collation and an "ie"
> > sequence that weights as a single unit for collation.
> >
> > During the debate about such an addition, the entity was
> > called various things, but the moniker "GRAPHEME JOINER"
> > caught on in the committee and stuck. There was also debate
> > about an equal and opposite "GRAPHEME NON-JOINER", on the
> > principle that inserting a GNJ between, e.g., a "ch" weighted
> > as a unit, so as to force it to be treated as two units would
> > be the more normal requirement in collation. However, the
> > committee did not develop consensus that that was a required
> > *character*, in part because insertion of *any* delimiting
> > character in that context could be taken as having that
> > effect or be tailored in collation to weight as desired to
> > distinguish it from the digraphic unit, for example.
> >
> > The "COMBINING" became part of the CGJ's name when it
> > became clear that the character should be given the
> > General Category Mn, making it a combining mark, rather
> > than General Category Cf to make it a format control.
> >
> > During this debate, high hopes were also placed on the
> > COMBINING GRAPHEME JOINER as being the magic bullet for all
> > kinds of things: it could "glue together" a pair of accents
> > so that they would render side-by-side instead of using the
> > default accent placement rules. It could also "glue together"
> > sequences of characters into a "grapheme cluster", so that
> > the grapheme cluster would become the target of an enclosing
> > combining mark -- that would resolve the problem of how to
> > get an enclosing circle to circle an arbitrary number, rather
> > than just a single digit, for example.
> >
> > In the end, however, the inconsistent and troubling
> > implications of this attempt at getting the Unicode
> > Standard further involved in the monkey business of trying
> > to be a glyph description language, rather than a character
> > encoding, caused many second thoughts. And the UTC formally
> > backed away from all those silver bullet aspects of CGJ. In
> > Unicode 4.0, CGJ has been stripped of all interpretation
> > except as an invisible mark which can be used to tailor
> > collation (and searching), so as to distinguish digraphic
> > units from sequences of the same characters.
> >
> > If you look at UAX #29, Text Boundaries, now, and in
> > particular, Section 3, Grapheme Cluster Boundaries, you will
> > see that CGJ has nothing to do with the definition of such
> > boundaries. While it has the Grapheme_Link property (as do
> > all the Indic viramas), Grapheme_Link is no longer even
> > mentioned in UAX #29, and Grapheme_Link is nowhere else used,
> > not even in a derived property.
> >
> > So the shorthand interpretation of CGJ currently is
> > "invisible target for collation tailoring of neighboring
> > characters into a digraphic unit." Even calling it by its
> > formal name, COMBINING GRAPHEME JOINER, immediately conjures
> > up the wrong connotations, so it is better to just use the
> > CGJ acronym and not spell it out. Or think of CGJ as standing
> > for "Collation kluGJe", if you wish. ;-)
> >
> > Now when you say:
> >
> > > If I recall correctly, the suggestion for using CGJ for
> > yerushala(y)im
> > > was to encode it as: <...lamed, patah, cgj, hiriq, final
> > mem>. Also, I
> > > seem to recall that this gave some people heartburn because CGJ was
> > > not intended to join two combining characters.
> >
> > If people are getting "heartburn" because CGJ is not intended
> > to join two combining characters, the problem they are having
> > is the result of a misunderstanding of the intent here.
> >
> > It is *true* that the CGJ is no longer intended to "join two
> > combining characters", although people tried for awhile to
> > see if it would work to "glue together two combining
> > characters" for different rendering.
> >
> > But the point of the CGJ proposal with respect to Biblical
> > Hebrew is *not* to somehow sneak back around to interpreting
> > the CGJ as gluing two combining characters together. Instead,
> > it turns out that the CGJ, whose interpretation has been
> > whittled down to being almost nothing, has the appropriate
> > set of character *properties* to serve to block canonical
> > reordering of a combining character sequence. The important
> > things are that it is a) invisible, b) a combining mark, and
> > c) has combining class zero. To serve the purpose of blocking
> > the canonical ordering, it doesn't have to *do* anything but
> > just sit there with its properties as defined. It doesn't
> > "join" anything, and it doesn't have anything to do with the
> > "grapheme" status of the resulting sequence.
> >
> > The only other Unicode characters with those properties are
> > the variation selectors, but those characters *do* have
> > cooccurrence constraints that prevent them from following a
> > combining mark (at least in a legally interpretable way).
> > That leaves the CGJ as the *only* Unicode character which has
> > the desired properties and which has no constraints against
> > occurrence in the middle of a combining character sequence.
> >
> > Another way of thinking of this is that in addition to CGJ
> > being the "Collation kluGJe", it can be interpreted as the
> > "Canonical Gradient Jigger", if we simply acknowledge the
> > fact that, given its current properties, if it occurs in the
> > relevant sequences of combining marks, it already has the
> > effect of jiggering the canonical gradients to produce just
> > the distinctions desired. ;-)
> >
> > > Of course, zwnbs is not a base character. If using zwnbs is
> > a problem
> > > (because it has no visible glyph and/or because it has
> > category Cf),
> > > then perhaps what is needed is another character (perhaps a
> > new one)
> > > that has no width or visible glyph but can be treated as a base
> > > character (category Lo). That may be needed anyway, since
> > some of the
> > > boundary definitions have special rules for zwnbs.
> >
> > There is no need for an invisible base character here. That
> > *would* be going further than is necessary to solve the
> > problem, and would create arguments about the actual content
> > of the text -- are we encoding an inherent consonant here or
> > not? Why go there, when the problem is simply to represent
> > the text as shown and then let commentators and phonologists
> > argue about whether the yod is "really" there or not.
> >
> > > Ted
> > >
> > > P.S. It's two p's but only one d. :)
> >
> > Sorry. Anticipatory doubling, I guess...
> >
> > --Ken
> >
> >
> >
> >
> >
>
>
>
This archive was generated by hypermail 2.1.5 : Sat Jul 26 2003 - 10:15:08 EDT