From: Peter Kirk (peterkirk@qaya.org)
Date: Mon Jan 17 2005 - 05:08:09 CST
On 16/01/2005 18:58, Doug Ewell wrote:
>Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
>
>
>>>Elaine, the good news for you is that if you order your Unicode
>>>Hebrew text according to these 'alternative combining classes' you
>>>will not be deviating at all from the Unicode standard. Your text
>>>will not be normalised in any of the standard normalisation forms,
>>>but the standard nowhere specifies that texts must be normalised. Of
>>>course you need to ensure that your text is not normalised by other
>>>processes, or that if it is you then restore it to the order of the
>>>'alternative combining classes' - a process which should be
>>>reversible.
>>>
>>>
>>Note that you can't define "alternative combining classes" the way you
>>want, if you need to preserve canonincal equivalence.
>>
>>
>
>
>
Philippe, I am well aware of all that you wrote in your posting. I am
well aware that I cannot renormalise according to any arbitrary
collection of alternative combining classes and preserve canonical
equivalence. But I think that the only requirement for preservation of
canonical equivalence (which is of course defined relative to the
standard combining classes) is to avoid splitting combining classes and
reassigning from class zero to non-zero; merging of classes and
reassigning to class zero are not a problem because they inhibit reordering.
However, the particular collection of alternative combining classes
which Elaine and I were referring to (which is defined in Appendix B of
the SBL Hebrew manual, available as part of the free download package)
was carefully designed, by others who know these issues well, to avoid
splitting of combining classes, and thereby to allow texts to be
normalised according to them without destroying canonical equivalence.
(I am not sure that the set as it stands in fact meets this goal 100%
because there is some splitting of class 230, but the intention of it
was to meet this goal.)
By the way, why has this discussion moved back from the Hebrew list to
the main Unicode list?
>Isn't that what Peter said? If you don't care about standard
>normalization forms, you don't care about canonical equivalence.
>
>
>
No, Doug, this isn't what I said. I do care about canonical equivalence.
Your last sentence is a total non sequitur, which is in fact
contradicted by the Unicode conformance clauses which specify
preservation of canonical equivalence but do not mandate standard
normalisation forms.
My intention is that the processes in use for linguistic analysis of
Hebrew should be fully Unicode conformant, and my point as quoted above
was that this is possible. That implies that the processes, although
they may ignore standard normalisation forms, must preserve canonical
equivalence. On the other hand, because of the theoretically and
practically highly unfortunate choice of combining classes for certain
Hebrew combining marks, it is necessary to reorder the text (without
destroying canonical equivalence) according to a carefully chosen set of
"alternative combining classes" - not only for linguistic analysis but
also for efficient rendering, which was the original motivation for the
already defined "alternative combining classes".
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.300 / Virus Database: 265.6.13 - Release Date: 16/01/2005
This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 11:43:21 CST