From: Mark E. Shoulson (firstname.lastname@example.org)
Date: Thu Nov 17 2005 - 19:14:33 CST
Mark Davis wrote:
> It is not that clear-cut. Identifiers by their nature cannot include
> all words and phrases valid in all languages. For IDN, for example,
> one can't express the perfectly reasonable English word "can't", or a
> word like "I.B.M.".
True. But I would contend that the Hebrew GERESH is a different
matter. "Can't" is a contraction, "I.B.M." an acronym; both are
linguistic features indicated by punctuation, as you say. But a word
like צ׳יפס is not any sort of construction out of other words, and
there's no other way to write it. You can always write "cannot" and
"International Business Machines," but short of changing the word and
using synonyms, the only way to write "chips" (in the sense of fried
potato sticks; in British usage) is with the GERESH. I don't think this
is part of a minimal pair (i.e. I don't think there's another word ציפס
which differs only in the lack of GERESH which means something else),
but such pairs exist, I'm sure. These "foreign" sounds may not be part
of historical Hebrew, but they certainly are part of how it is spoken
today, and in the sense of being "foreign" letters, they're no worse
than special letters used in various Indic languages to write Sanskrit
sounds that don't otherwise occur in the language. Fortunately, GERESH
is productive, and we only need the one symbol for a variety of foreign
There *might* be a stronger argument for excluding GERSHAYIM, as it
doesn't have the same phonetic usage but is more along the lines of the
periods used in I.B.M. above, but I'd rather be inclusive in this case.
Besides, GERSHAYIM isn't strictly used in abbreviations; letter-names in
Hebrew are commonly written using it, and some words that started out as
acronyms have become pronounced as words in their own right (like radar
or NASA in English), but didn't always lose the GERSHAYIM as English
acronym words generally do. Even when they get inflected, in some cases.
> The UTC decided that against adding them to the identifier definition.
> If we were to change that for the Hebrew punctuation, we would have to
> see a documented case for it.
GERESH and GERSHAYIM both have functions as punctuation, it is true, and
it is sensible to exclude punctuation from IDN identifiers. But GERESH,
at least, also has a phonetic function that to me seems more part of
~mark, but another one.
This archive was generated by hypermail 2.1.5 : Thu Nov 17 2005 - 19:17:01 CST