Re: Hebrew script in IDN (was Exemplar Characters)

From: Mark E. Shoulson (
Date: Thu Nov 17 2005 - 19:14:33 CST

  • Next message: Cary Karp: "Re: Hebrew script in IDN"

    Mark Davis wrote:

    > It is not that clear-cut. Identifiers by their nature cannot include
    > all words and phrases valid in all languages. For IDN, for example,
    > one can't express the perfectly reasonable English word "can't", or a
    > word like "I.B.M.".

    True. But I would contend that the Hebrew GERESH is a different
    matter. "Can't" is a contraction, "I.B.M." an acronym; both are
    linguistic features indicated by punctuation, as you say. But a word
    like צ׳יפס is not any sort of construction out of other words, and
    there's no other way to write it. You can always write "cannot" and
    "International Business Machines," but short of changing the word and
    using synonyms, the only way to write "chips" (in the sense of fried
    potato sticks; in British usage) is with the GERESH. I don't think this
    is part of a minimal pair (i.e. I don't think there's another word ציפ‏ס
    which differs only in the lack of GERESH which means something else),
    but such pairs exist, I'm sure. These "foreign" sounds may not be part
    of historical Hebrew, but they certainly are part of how it is spoken
    today, and in the sense of being "foreign" letters, they're no worse
    than special letters used in various Indic languages to write Sanskrit
    sounds that don't otherwise occur in the language. Fortunately, GERESH
    is productive, and we only need the one symbol for a variety of foreign

    There *might* be a stronger argument for excluding GERSHAYIM, as it
    doesn't have the same phonetic usage but is more along the lines of the
    periods used in I.B.M. above, but I'd rather be inclusive in this case.
    Besides, GERSHAYIM isn't strictly used in abbreviations; letter-names in
    Hebrew are commonly written using it, and some words that started out as
    acronyms have become pronounced as words in their own right (like radar
    or NASA in English), but didn't always lose the GERSHAYIM as English
    acronym words generally do. Even when they get inflected, in some cases.

    > The UTC decided that against adding them to the identifier definition.
    > If we were to change that for the Hebrew punctuation, we would have to
    > see a documented case for it.

    GERESH and GERSHAYIM both have functions as punctuation, it is true, and
    it is sensible to exclude punctuation from IDN identifiers. But GERESH,
    at least, also has a phonetic function that to me seems more part of
    ordinary spelling.

    > Mark

    ~mark, but another one.

    This archive was generated by hypermail 2.1.5 : Thu Nov 17 2005 - 19:17:01 CST