Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Jan 16 2005 - 19:11:04 CST

Next message: Peter Kirk: "Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)"

Previous message: Doug Ewell: "Normalization (was: Re: Hebrew combining classes)"
In reply to: Doug Ewell: "Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)"
Next in thread: Doug Ewell: "Normalization (was: Re: Hebrew combining classes)"
Reply: Doug Ewell: "Normalization (was: Re: Hebrew combining classes)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "Doug Ewell" <dewell@adelphia.net>
> Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
>>> Elaine, the good news for you is that if you order your Unicode
>>> Hebrew text according to these 'alternative combining classes' you
>>> will not be deviating at all from the Unicode standard. Your text
>>> will not be normalised in any of the standard normalisation forms,
>>> but the standard nowhere specifies that texts must be normalised. Of
>>> course you need to ensure that your text is not normalised by other
>>> processes, or that if it is you then restore it to the order of the
>>> 'alternative combining classes' - a process which should be
>>> reversible.
>>
>> Note that you can't define "alternative combining classes" the way you
>> want, if you need to preserve canonincal equivalence.
>
> Isn't that what Peter said? If you don't care about standard
> normalization forms, you don't care about canonical equivalence.

This was a different point. it just spoke about alternate normalization
forms, but in my opinion, a transformation based on "alternative combining
classes" should not be named "normalization" if it does not preserve
canonical equivalence.

My opinion is just weakened by the fact that Unicode also speaks about
"normalization" when refering to NFKC and NFKD forms, despite they don't
preserve the canonical equivalence.

But a normalization thart uses alternate combining class values but
preserves the partition of characters in their combining class, will
preserve the canonical equivalence and can be named "normalization" alone,
but probably "denormalization". I don't say that such process will not be
useful. In fact there already exists such transforms that are part of other
standards:

NFC and NFD forms are not extremely useful, including for collation, or even
for rendering. They only suit the need for compatibility with non-Unicode
standards that can't compose/decompose characters themselves. But modern
text algorithms texts should be able to process equally every input text in
any canonically equivalent form, whever it is normalized or not. So NFC and
NFD forms, as well as existing combining classes are just there to specify
those equivalent texts that should be treated equally. But processes will
often need to perform more to recognize texts (using a NFC/D normalization
on input before applying other denormalizations will force these processes
to become conformant to Unicode, but the two operations can be joined into a
single one).

Next message: Peter Kirk: "Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)"
Previous message: Doug Ewell: "Normalization (was: Re: Hebrew combining classes)"
In reply to: Doug Ewell: "Re: [hebrew] Re: Hebrew combining classes (was ISO 10646 compliance and EU law)"
Next in thread: Doug Ewell: "Normalization (was: Re: Hebrew combining classes)"
Reply: Doug Ewell: "Normalization (was: Re: Hebrew combining classes)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jan 17 2005 - 11:27:27 CST