Re: Hebrew script in IDN

From: Mark E. Shoulson (mark@kli.org)
Date: Sun Nov 20 2005 - 18:47:07 CST

  • Next message: Cary Karp: "Re: Hebrew script in IDN"

    Cary Karp wrote:

    > There's more to the use of Hebrew script in IDN than GERESH or
    > GERSHAYIM :-)
    >
    > With specific regard to Yiddish--
    >
    > The Yiddish digraphs 'tsvey vovn', 'vov yud', and 'tsvey yudn', can be
    > entered in two different ways from a Hebrew keyboard. If there are
    > single keys for each of them, it is likely that they will produce the
    > ligatures HEBREW LIGATURE YIDDISH DOUBLE VAV (U+05F0), HEBREW LIGATURE
    > YIDDISH VAV YOD (U+05F1), and HEBREW LIGATURE YIDDISH DOUBLE YOD
    > (U+05F2). Even when this option is available, some users may enter
    > them as two key combinations, giving HEBREW LETTER VAV - HEBREW LETTER
    > VAV (U+05D5 U+05D5), HEBREW LETTER VAV - HEBREW LETTER YOD (U+05D5
    > U+05D9), and HEBREW LETTER YOD - HEBREW LETTER YOD (U+05D9 U+05D9). It
    > is not apparent that the one form is used preferentially to the other,
    > and no attempt at normalizing them has yet been made.
    >
    > However, in an application such as IDN where a string entered from a
    > keyboard needs to be matched exactly with a stored string, and the
    > keyboarded string may be represented in different ways, the
    > application will obviously need to accommodate all alternative input
    > forms. If the registry also contains the corresponding multiple
    > representations, the intended result at the user end will be ensured.

    I started to write an answer, and now I'm pretty sure what I was going
    to say was wrong. I may be missing something, but it looks like these
    distinctions aren't being erased (as they should be) by the
    normalization process! I would have thought that would be a no-brainer.
    I'd venture to say that double-vav, vav-yod, and yod-yod ligatures
    should have *canonical* decomposition to their constituent letters! I'm
    sure that would cause problems of some sort, but at least compatibility
    decomposition is necessary.

    > There are also good reasons for preferring the stored form to be
    > unique. At least on first consideration, it would seem to make sense
    > for the canonical form to be the one most frequently encountered in
    > keyboarding practice. Does anyone on this list know if these three
    > digraphs are more frequently entered as single characters, or as two
    > characters combinations? What would the likely behavior be if it were
    > not clear to the user whether the string to be entered was in Yiddish
    > or in Hebrew?

    Doesn't really matter which is the more frequently entered; we normalize
    strings all the time in Unicode.

    ~mark



    This archive was generated by hypermail 2.1.5 : Sun Nov 20 2005 - 18:48:05 CST