Re: Yerushala(y)im - or Biblical Hebrew

From: Peter Kirk (
Date: Tue Jul 08 2003 - 18:08:41 EDT

  • Next message: Peter Kirk: "Re: Yerushala(y)im - or Biblical Hebrew"

    On 08/07/2003 12:56, Philippe Verdy wrote:

    >Suppose your character PATAH-HIRIQ is accepted, and is
    >defined as being canonically equivalent to PATAH-HIRIQ.
    >Then the definition of canonical equivalence with all Unicode
    >algorithm would allow any of these algorithm to decompose
    >it to NFD as a pair of characters PATAH and HIRIQ, which
    >are then immediately reordered, into HIRIQ then PATAH.
    >The canonical exclusion just forbids recombining them
    >together into PATAH-HIRIQ.
    I am aware of this, and that is why I specified in my second posting a
    canonical decomposition of hiriq - patah.

    >So it remains the NFC sequence: <consonnant, hiriq, patah>
    >And your proposed character is useless (it becomes a
    >compatibility character, not recommended, exactly similar
    >to the "Greek Dialitika with Tonos").
    I take the point. Well, at least it would be specifying a distinct
    graphical form, unlike dialitika with tonos. But I accept that there is
    actually little to be gained by specifying such characters.

    >The only way to solve your problem is to make it only a
    >compatibility decomposition, which is excluded from NFC
    >and NFD decomposition and reordering... This would be,
    >I think, the first accepted combining character with a
    ><compat> decomposition and not a canonical decomposition.
    >In addition, the Unicode stability policy would require that
    >the defined <compat> decomposition be given in canonical
    >Llook for example, the many Arabic <compat> decompositions, ...
    Which are you referring to? In the Arabic block I can find only four
    such decompositions, 0675-0678, and I don't see how the issue here can
    be relevant as neither of the components are themselves decomposable. Or
    are you talking about the presentation forms? I thought these had to be
    compatibility decompositions as there is formatting involved.

    >...which could not be made canonical for the simple reason that
    >the Unicode policy pact guarantees that the decompositions
    >will be defined in canonical order, and only include a character
    >pair for canonical decompositions whose second character is
    >not canonically decomposable...
    >-- Philippe.
    As you got me looking in the Arabic presentation forms, I found an
    interesting Arabic rough equivalent of what we might need for Hebrew:
    0640, which is not really a letter but just a spacer, but can carry
    combining marks, see FCF2-FCF4.

    Peter Kirk

    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 18:59:22 EDT