From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jul 08 2003 - 15:56:11 EDT
On Tuesday, July 08, 2003 8:21 PM, Peter Kirk <peter.r.kirk@ntlworld.com> wrote:
> On 08/07/2003 11:10, Philippe Verdy wrote:
>
> > Admit that your proposal of using a canonical decomposition would
> > still cause problems with all Unicode algorithms, and with XML
> > processing.
> >
> > Only a NFKD decomposition would make your proposed "ligature"
> > character workable for XML processing and Unicode algorithms,
> > including UCA, case mappings, UTF representations, etc...
>
> This proposal for a compatibility decomposition is a possible
> alternative, but it's not my proposal, it's yours. I was deliberately
> avoiding anything like this which is not compatible with existing
> texts. If canonical decomposition isn't going to work, which I'm
> still not 100% sure of if composition is blocked, then I will
> withdraw my proposal.
I don't see why a new code point allocation would be incompatible
if it uses a compatible decomposition instead of a canonical
decomposition; that's you who proposed this allocation, but I
replied that canonical composition exclusion is blocked for *any*
canonically equivalent decompositions of a character, and thus
any canonical decomposition of your proposed precombined
character would not solve the problem, just complicate it:
Suppose your character PATAH-HIRIQ is accepted, and is
defined as being canonically equivalent to PATAH-HIRIQ.
Then the definition of canonical equivalence with all Unicode
algorithm would allow any of these algorithm to decompose
it to NFD as a pair of characters PATAH and HIRIQ, which
are then immediately reordered, into HIRIQ then PATAH.
The canonical exclusion just forbids recombining them
together into PATAH-HIRIQ.
So it remains the NFC sequence: <consonnant, hiriq, patah>
And your proposed character is useless (it becomes a
compatibility character, not recommended, exactly similar
to the "Greek Dialitika with Tonos").
The only way to solve your problem is to make it only a
compatibility decomposition, which is excluded from NFC
and NFD decomposition and reordering... This would be,
I think, the first accepted combining character with a
<compat> decomposition and not a canonical decomposition.
In addition, the Unicode stability policy would require that
the defined <compat> decomposition be given in canonical
order.
Llook for example, the many Arabic <compat> decompositions,
which could not be made canonical for the simple reason that
the Unicode policy pact guarantees that the decompositions
will be defined in canonical order, and only include a character
pair for canonical decompositions whose second character is
not canonically decomposable...
-- Philippe.
This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 16:42:49 EDT