Re: Yerushala(y)im - or Biblical Hebrew

From: Kenneth Whistler (
Date: Mon Jul 28 2003 - 19:49:52 EDT

  • Next message: Ted Hopp: "Re: Yerushala(y)im - or Biblical Hebrew"


    > The goal of the Maginot Line was longterm stability.

    I'll resist the temptation to assault that metaphorical
    defensive line directly, and instead just sweep right by it...

    > Do I understand you correctly, Ken, that Sybase would rather have code
    > versions that behave consistently but incorrectly (from a user's point of
    > view) rather than inconsistent versions, the newer ones of which behave
    > correctly? I can accept that such could be a company's business priority,
    > but I just want to know if that's what you are saying.

    Yes, I am saying that. However, I disagree with the presupposition
    embedded in:

    > but incorrectly (from a user's point of view)

    which I think is as faulty as that of people who might claim that,
    for example, storing for Swedish as <a, combining diaeresis>
    would be incorrect from a user's point of view.

    It is very important to my company (and to many others implementing
    Unicode) that normalization not be changed in ways such that
    data normalized as specified in Unicode 4.0, for example, become
    *un*normalized by the specification of Unicode 5.0, for example,
    so that reapplication of a newer version of the algorithm would
    potentially change normalized data. That is the issue.

    Is it a priority for my company that Biblical Hebrew "behave
    incorrectly from a user's point of view"? Of course not.

    But if yerushala(y)im is "spelled correctly", in this case,
    with a CGJ, then implementation of correct behavior from
    a user's point of view -- even taking into account that
    data may be subject to normalization beyond the user's
    control (as for web publication) -- is possible, while not
    destabilizing normalization whatsoever.

    Making it possible for potential customers to be satisfied
    and happy with their software's behavior, while simultaneously
    preserving the stability of infrastructure algorithms
    important to our products, *is* a priority for my company.

    > Also, I don't understand in what sense the normalization *algorithm* gets
    > broken by changing combining classes. Could someone elaborate?

    From the Unicode Standard, Version 4.0:

    "D8a The logical description of a process used to achieve a
         specified result involving Unicode characters."
    Part of the specification of the Unicode normalization algorithm
    is idempotency *across* versions, so that addition of new
    characters to the standard, which require extensions of the
    tables for decomposition, recomposition, and composition
    exclusion in the algorithm, does *not* result in a situation
    where application of a later version of the normalization algorithm
    results in change of *any* string normalized by an earlier version
    of the algorithm.

    The suggested changes in combining class values would break *that*

    I'm not suggesting that the code in anyone's particular
    implementation would suddenly go haywire and start producing
    segmentation faults if we swapped two numbers in the table
    of combining class values that it uses.


    This archive was generated by hypermail 2.1.5 : Mon Jul 28 2003 - 20:20:57 EDT