Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)

From: Peter Kirk (
Date: Mon Jan 17 2005 - 05:08:09 CST

  • Next message: Hans Aberg: "32'nd bit & UTF-8"

    On 16/01/2005 18:58, Doug Ewell wrote:

    >Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    >>>Elaine, the good news for you is that if you order your Unicode
    >>>Hebrew text according to these 'alternative combining classes' you
    >>>will not be deviating at all from the Unicode standard. Your text
    >>>will not be normalised in any of the standard normalisation forms,
    >>>but the standard nowhere specifies that texts must be normalised. Of
    >>>course you need to ensure that your text is not normalised by other
    >>>processes, or that if it is you then restore it to the order of the
    >>>'alternative combining classes' - a process which should be
    >>Note that you can't define "alternative combining classes" the way you
    >>want, if you need to preserve canonincal equivalence.
    Philippe, I am well aware of all that you wrote in your posting. I am
    well aware that I cannot renormalise according to any arbitrary
    collection of alternative combining classes and preserve canonical
    equivalence. But I think that the only requirement for preservation of
    canonical equivalence (which is of course defined relative to the
    standard combining classes) is to avoid splitting combining classes and
    reassigning from class zero to non-zero; merging of classes and
    reassigning to class zero are not a problem because they inhibit reordering.

    However, the particular collection of alternative combining classes
    which Elaine and I were referring to (which is defined in Appendix B of
    the SBL Hebrew manual, available as part of the free download package)
    was carefully designed, by others who know these issues well, to avoid
    splitting of combining classes, and thereby to allow texts to be
    normalised according to them without destroying canonical equivalence.
    (I am not sure that the set as it stands in fact meets this goal 100%
    because there is some splitting of class 230, but the intention of it
    was to meet this goal.)

    By the way, why has this discussion moved back from the Hebrew list to
    the main Unicode list?

    >Isn't that what Peter said? If you don't care about standard
    >normalization forms, you don't care about canonical equivalence.
    No, Doug, this isn't what I said. I do care about canonical equivalence.
    Your last sentence is a total non sequitur, which is in fact
    contradicted by the Unicode conformance clauses which specify
    preservation of canonical equivalence but do not mandate standard
    normalisation forms.

    My intention is that the processes in use for linguistic analysis of
    Hebrew should be fully Unicode conformant, and my point as quoted above
    was that this is possible. That implies that the processes, although
    they may ignore standard normalisation forms, must preserve canonical
    equivalence. On the other hand, because of the theoretically and
    practically highly unfortunate choice of combining classes for certain
    Hebrew combining marks, it is necessary to reorder the text (without
    destroying canonical equivalence) according to a carefully chosen set of
    "alternative combining classes" - not only for linguistic analysis but
    also for efficient rendering, which was the original motivation for the
    already defined "alternative combining classes".

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.300 / Virus Database: 265.6.13 - Release Date: 16/01/2005

    This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 11:43:21 CST