Re: Yerushala(y)im - or Biblical Hebrew

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jul 08 2003 - 14:10:59 EDT

  • Next message: Peter Kirk: "Re: Yerushala(y)im - or Biblical Hebrew"

    On Tuesday, July 08, 2003 6:48 PM, Peter Kirk <peter.r.kirk@ntlworld.com> wrote:

    > On 08/07/2003 09:16, Philippe Verdy wrote:
    >
    > > Even if listed in the Canonical Composition Exclusion list, this
    > > would not work: this list only refers to characters that are
    > > canonically decomposable into a character pair, and that MUST be
    > > decomposed
    > > and MUST NOT be recomposed when creating *either* a NFC or
    > > NFD form.
    >
    > I am not trying to block reordering here. I accept that if the input
    > data is patah - hiriq, this will (barring unacceptable changes to
    > combining classes etc) always be normalised to hiriq - patah in both
    > NFC and NFD. But normalisation forms don't specify rendering, and
    > there are already well known exceptions to the general rule that the
    > order of rendering follows the order of encoding. So all I am trying
    > to suggest here is a way of specifying that the sequence hiriq -
    > patah should be rendered as if it were patah - hiriq. Is there a way
    > of doing that, without spilling too much sacred cow blood?

    Admit that your proposal of using a canonical decomposition would
    still cause problems with all Unicode algorithms, and with XML
    processing.

    Only a NFKD decomposition would make your proposed "ligature"
    character workable for XML processing and Unicode algorithms,
    including UCA, case mappings, UTF representations, etc...

    But using a NFKD decomposition means that you create a new
    character with its own identity, name, properties, set of glyphs,
    UCA rules, mappings, etc...

    Of course it would require a special keystroke sequence for
    inputing it, but it's not impossible. At least this proposal avoids
    the use of CGJ, and still allows an efficient rendering in fonts,
    where it would be defined by combining two glyphs in the correct
    order.

    Would then the NFKD decomposition be safe to define, as it
    would necessarily have to reverse the composed vowels inherently
    part of the new character? It would also create possible confusion,
    as it would be probably named "HEBREW LETTERS PATAH HIRIQ"
    (and defined with which combining class value, the highest for PATAH
    or the lowest for HIRIQ?), but its compatible decomposition would be
    <compat> HIRIQ PATAH, to keep the requirements of NFC/NFD
    stability...

    -- Philippe.



    This archive was generated by hypermail 2.1.5 : Tue Jul 08 2003 - 14:56:52 EDT