Re: Folding algorithm and canonical equivalence

From: Peter Kirk (
Date: Sun Jul 18 2004 - 10:25:26 CDT

    On 18/07/2004 12:51, Michael Everson wrote:

    > At 13:00 +0300 2004-07-18, Jony Rosenne wrote:
    >> > Jony is arguing to extend AccentFolding to Hebrew (fold to
    >>> unpointed). His
    >>> suggestion is to fold *all* combining marks used with Hebrew
    >>> in that case.
    >>> I want to double check that he really means all combining
    >>> marks in the
    >> > Hebrew block, or just some of them.
    >> I did mean all. All points and cantillation marks in Hebrew are
    >> optional.
    > In the Hebrew language, perhaps. But in other languages, like Yiddish,
    > which use the Hebrew script, at least some points are NOT optional,
    > and "dropping" them causes textual corruption and loss of data.

    The same is of course true of accent removal in Latin script, in many
    European languages. The general accent folding, like DUCET, has to make
    the best compromise between preferred usage in the most widely used
    languages; or it can be tailored to the needs of specific languages.
    Indeed in some sense every folding involves loss of data; that is the
    nature of a folding. That doesn't stop generic accent removal being a
    useful folding, in Latin and Hebrew scripts.

    The question in one sense is whether accent and diacritic folding is a
    graphical process or a logical one. If it is a logical process, it has
    to take into account all sorts of potentially language-specific
    variables such as the phonetic function of each combining mark. But it
    makes more sense, within the scope of Unicode folding, for it to be
    specified as a graphical process, the removal of auxiliary glyphs and
    glyph modifiers from base characters without regard for their phonetic
    effect or their status within the orthography of particular languages.

    Anyway, is Yiddish in fact never written completely unpointed? That
    would surprise me.

    Peter Kirk (personal) (work)

