Re: Yerushala(y)im - or Biblical Hebrew

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jul 08 2003 - 15:56:11 EDT

  • Next message: Philippe Verdy: "Re: French group separators, was Re: The character for 10**24 i nJapanesenumbers (jo)"

    On Tuesday, July 08, 2003 8:21 PM, Peter Kirk <peter.r.kirk@ntlworld.com> wrote:

    > On 08/07/2003 11:10, Philippe Verdy wrote:
    >
    > > Admit that your proposal of using a canonical decomposition would
    > > still cause problems with all Unicode algorithms, and with XML
    > > processing.
    > >
    > > Only a NFKD decomposition would make your proposed "ligature"
    > > character workable for XML processing and Unicode algorithms,
    > > including UCA, case mappings, UTF representations, etc...
    >
    > This proposal for a compatibility decomposition is a possible
    > alternative, but it's not my proposal, it's yours. I was deliberately
    > avoiding anything like this which is not compatible with existing
    > texts. If canonical decomposition isn't going to work, which I'm
    > still not 100% sure of if composition is blocked, then I will
    > withdraw my proposal.

    I don't see why a new code point allocation would be incompatible
    if it uses a compatible decomposition instead of a canonical
    decomposition; that's you who proposed this allocation, but I
    replied that canonical composition exclusion is blocked for *any*
    canonically equivalent decompositions of a character, and thus
    any canonical decomposition of your proposed precombined
    character would not solve the problem, just complicate it:

    Suppose your character PATAH-HIRIQ is accepted, and is
    defined as being canonically equivalent to PATAH-HIRIQ.
    Then the definition of canonical equivalence with all Unicode
    algorithm would allow any of these algorithm to decompose
    it to NFD as a pair of characters PATAH and HIRIQ, which
    are then immediately reordered, into HIRIQ then PATAH.
    The canonical exclusion just forbids recombining them
    together into PATAH-HIRIQ.

    So it remains the NFC sequence: <consonnant, hiriq, patah>
    And your proposed character is useless (it becomes a
    compatibility character, not recommended, exactly similar
    to the "Greek Dialitika with Tonos").

    The only way to solve your problem is to make it only a
    compatibility decomposition, which is excluded from NFC
    and NFD decomposition and reordering... This would be,
    I think, the first accepted combining character with a
    <compat> decomposition and not a canonical decomposition.
    In addition, the Unicode stability policy would require that
    the defined <compat> decomposition be given in canonical
    order.

    Llook for example, the many Arabic <compat> decompositions,
    which could not be made canonical for the simple reason that
    the Unicode policy pact guarantees that the decompositions
    will be defined in canonical order, and only include a character
    pair for canonical decompositions whose second character is
    not canonically decomposable...

    -- Philippe.



    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 16:42:49 EDT