Re: Yerushala(y)im - or Biblical Hebrew

From: John Cowan (
Date: Tue Jul 08 2003 - 11:14:45 EDT

  • Next message: Karljürgen Feuerherm: "Re: SPAM: Re: Yerushala(y)im - or Biblical Hebrew"

    Peter Kirk scripsit:

    > The solution for this sequence is as follows: Define a new combining
    > character something like HEBREW LIGATURE PATAH HIRIQ with a canonical
    > decomposition of hiriq - patah (yes, that way round) and a glyph with a
    > hiriq to the left of a patah. How does this help? Well, it will not
    > affect users who type patah then hiriq, in non-canonical order, into an
    > application which does not immediately normalise the text, as the
    > renderer will still render hiriq to left of patah as typed. But when
    > this text is normalised into NFC, the sequence will first be reordered
    > as hiriq - patah, and then this combination will be composed into the
    > new ligature. That is correct, isn't it?

    Such a character could only be encoded if it were put into the list
    of composition exceptions, because it would upset the stability of
    normalization. The guarantee is that as long as a text contains only
    characters that occur in version V of Unicode, all normalizers written to
    versions greater than or equal to V will produce the same results on it.
    You are creating a situation where patah followed by hiriq will normalize
    one way in Unicode 4.0 (since those are 4.0 characters) and another way
    in some later version. So what you want is as big a no-no as changing
    canonical decomposition, and for exactly the same reason.

    John Cowan
    Be yourself.  Especially do not feign a working knowledge of RDF where
    no such knowledge exists.  Neither be cynical about RELAX NG; for in
    the face of all aridity and disenchantment in the world of markup,
    James Clark is as perennial as the grass.  --DeXiderata, Sean McGrath

    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 12:04:16 EDT