Re: Yerushala(y)im - or Biblical Hebrew

From: Ted Hopp (ted@newslate.com)
Date: Fri Jul 25 2003 - 18:10:07 EDT

  • Next message: Kenneth Whistler: "Re: Yerushala(y)im - or Biblical Hebrew"

    Thanks, Ken. That sensitizes me to some of the problems involved with
    breaking stability.

    If I recall correctly, the suggestion for using CGJ for yerushala(y)im was
    to encode it as: <...lamed, patah, cgj, hiriq, final mem>. Also, I seem to
    recall that this gave some people heartburn because CGJ was not intended to
    join two combining characters. What if this case were encoded as: <...lamed,
    patah, cgj, zwnbs, hiriq, final mem>? (Please forgive me if this is what had
    been proposed all along.)

    As I understand it from reading the description of CGJ (and ignoring for the
    moment that zwnbs has no visible glyph and is general category Cf), this is
    exactly what CGJ was designed for: treat the two base characters on either
    side of the CGJ as a single grapheme for the purpose of placing combining
    characters. This approach uses zero width no-break space to represent the
    "missing letter" interpretation of the two vowels pointed out by Jony
    Rosenne. Normalization wouldn't destroy the ordering of the vowels, and
    Hebrew-aware software could be written to do all this more-or-less
    transparently and automatically.

    Of course, zwnbs is not a base character. If using zwnbs is a problem
    (because it has no visible glyph and/or because it has category Cf), then
    perhaps what is needed is another character (perhaps a new one) that has no
    width or visible glyph but can be treated as a base character (category Lo).
    That may be needed anyway, since some of the boundary definitions have
    special rules for zwnbs.

    Ted

    P.S. It's two p's but only one d. :)

    ----- Original Message -----
    From: "Kenneth Whistler" <kenw@sybase.com>
    > Tedd Hopp asked:
    >
    > > Tell me if I'm wrong please, but isn't moving characters (however
    > > it's disguised) as much of a violation of the stability policy as is
    > > changing combining classes of the existing vowels?
    >
    > You're not wrong. It is a violation of the stability policy.
    >
    > > The Hebrew vowels interact typographically and the combining classes
    should
    > > have been assigned accordingly originally. That's what should be fixed
    now.
    >
    > But that is water under the bridge at this point. I am looking for
    > a *feasible* technical solution to the current *technical* problem.
    >
    > > I recognize the powerful political issues involved, and that these are
    > > barriers to this happening. But trying to find technical solutions to
    > > political problems is extremely short-sighted. I would urge that all
    > > technical efforts be directed away from solving the politicians'
    problems
    > > and focus on how to minimize whatever damage may be caused by changing
    the
    > > combining classes.
    >
    > What we have currently are:
    >
    > a. a minor technical problem (that certain sequences of vowel
    > points in Biblical Hebrew cannot be reliably distinguished
    > in normalized Unicode plain text)
    >
    > and
    >
    > b. a minor political problem (that certain communities of Biblical
    > scholars are badmouthing Unicode because it "can't fix its
    > obvious mistakes")
    >
    > Changing the combining classes of Hebrew points will create:
    >
    > a. a major technical problem (destabilization of normalization)
    >
    > and
    >
    > b. a major political problem (in both IETF and W3C, at least,
    > as well as between members in the UTC, with the non-zero
    > risk that the rift will result in the definition of competing
    > specifications of normalization, which will compound the
    > technical problem)
    >
    > >
    > > From my company's perspective, all other proposals I've seen would be
    more
    > > damaging to us than doing the right thing. It would be beneficial to
    hear
    > > from others on this list about what the specific technical (not
    political)
    > > impacts would be (both positive and negative) on their work and their
    > > products that would come from fixing the combining classes of the
    existing
    > > vowels.
    >
    > Speaking for Sybase products, "fixing" the combining classes of the
    > existing vowels would have *no* positive impacts. It would have
    > a large number of negative impacts, the ultimate ramifications
    > of which I cannot even follow to their eventual conclusions. It
    > would impact the implementation of normalization code, as well
    > as its testing. It would lead to nasty meetings where I would
    > try to explain to server developers why normalization on the
    > servers wasn't quite reliable, since it has this little Hebrew
    > "hole" over here. It might require figuring out how to specify
    > versions of normalization -- and I have no idea how the labels
    > for that would be reliably attached to data. It puts normalized
    > data into a kind of a Catch-22 situation where you could never
    > rely on its stability, since if it contained any of the offending
    > characters, the determination of whether it would change or not
    > if normalized again would depend on which *version* of the
    > normalization code it hit. The reaction, in the context of
    > some protocols or even database implementations, might be to
    > deny access to the offending characters -- essentially to rule
    > them offlimits because of their impact on normalization. And
    > how would *that* benefit Biblical Hebrew scholars?
    >
    > The whole situation just stinks of the neverending cascade of
    > problems that result, for example, from the small set of
    > persistent interoperability mismatches between various
    > interpretations of Shift-JIS encoding. That's the kind of
    > problem that can persist for a decade or more and which just
    > gets passed as the hot potato from one generation of
    > developers to the next.
    >
    > I expect you could hear testimonials from other database
    > developers on the list about the evils of destabilizing the
    > definition of Unicode normalization.
    >
    > --Ken



    This archive was generated by hypermail 2.1.5 : Fri Jul 25 2003 - 18:59:43 EDT