Re: IDN Security

From: Simon Montagu (
Date: Tue Feb 15 2005 - 17:17:35 CST

  • Next message: D. Starner: "Re: IDN Security"

    GERESH seems an even more likely candidate than GERSHAYIM to me, because
    it's not uncommon in proper names. There is a tendency for the GERSHAYIM
    to fall out of use as abbreviations become more commonly accepted as
    words in their own right, e.g. תפו״ז, the Hebrew for orange, which is
    originally an abbreviation for תפוח זהב but today almost always written
    without GERSHAYIM.

    GERESH on the other hand, when used as a modifier for letters like ג׳ ז׳
    or צ׳, would cause a change in pronounciation if it were omitted, so I
    would expect www.חבד to be a more acceptable substitute for
    www.חב״ד than www.זבוטינסקי.org for www.ז׳בוטינסקי.org

    Mark E. Shoulson wrote:
    > This is a good point. If there is a Hebrew punctuation character that
    > deserves to exist in IDNs (and I'm not saying there is), it is GERSHAYIM
    > (and possibly GERESH). A domain תנ״ך.com makes far more sense than
    > תנך.com. Better examples are available (תנ״ך is almost comfortable
    > un-gereshed these days). There are a lot of abbreviations used in
    > Hebrew in everyday usage that include this mark. It is worth noting
    > that it does *not* mean that the word is not to be read as a word. On
    > the contrary, many of these acronyms have standard pronunciations, and
    > are even pluralized (and occasionally even conjugated) as if they were
    > normal words. e.g. just the other day I saw in a newspaper headline the
    > "word" ח״כים for the plural of ח״כ = חבר כנסת = Member of Knesset. Note
    > that in this case the word is *not* meant to be pronounced, apparently,
    > since the KAF doesn't go into final form as it usually does for
    > pronounced acronyms, like תנ״ך (Hebrew Bible, an acronym for Torah[law],
    > Neviim[prophets], Ketuvim[scriptures])... and the adjective תנ״כי =
    > "Biblical" is quite normal. Sometimes these words have even passed into
    > verbs, as in דו״ח for דין וחשבון (lit. judgement and accounting; used to
    > mean "report" or "traffic ticket") passing into verbdom also, meaning
    > "to report". I'm not sure if it retains its GERSHAYIM when so
    > conjugated though (לְדַוֵּ״חַ?).
    > (It's true that the Hebrew MAQAF could theoretically be useful in a
    > domain name as well, but an ordinary HYPHEN-MINUS is a perfectly
    > acceptable substitute in this setting, and is what most people would use
    > anyway).
    > The problem, as correctly pointed out, is that I've pretty much *never*
    > seen the Unicode GERHSAYIM codes used properly for this. This is
    > probably because most stuff still dates back to ISO-8859 days, which had
    > no such codepoint. Probably most people don't even know it's there; I
    > am not surprised it isn't on a standard keyboard (I use my own weird
    > personalized keyboard for Hebrew). *Everyone* just uses DOUBLE-QUOTE
    > (also in cases of Hebrew abbreviations transcribed into Latin letters,
    > as in Z"L, B"H, HY"D, BS"D, etc). Run the "date" command on my Linux
    > system with the locale set to he_IL and you get:
    > ג' פבר 15 09:35:43 EST 2005
    > (after converting from ISO-8859-8 to UTF-8). Note the apostrophe
    > instead of GERESH after the weekday number. Well, this is from
    > ISO-8859-8, which had no GERESH... which I suppose is the point: the
    > locale for he_IL is not even Unicode!
    > If I were registering a domain name, from a linguistic perspective and
    > from the point of view of naming it the right thing, I'd definitely want
    > GERSHAYIM and probably GERESH available. But it would be a rare person
    > who could enter it correctly. Still, if you gave me the choice, I'd
    > prefer that they be included; maybe it will encourage correct usage in
    > the future.
    > N.B. throughout this, GERESH and GERSHAYIM refer to the *punctuations*
    > of those names, not to be confused in any way with the *accents* of the
    > same names.
    > ~mark
    > Cary Karp wrote:
    >> Quoting Mark E. Shoulson:
    >>> I recognize this is opening a can of worms... but then, it was you
    >>> that opened it. I'm looking at the idn-chars.html page, and I have a
    >>> few questions about (naturally) the Hebrew script (since that's one
    >>> I'm familiar with).
    >> I have another question about the IDN implementation of the Hebrew
    >> script. Given that IDN security concerns stand in direct proportion to
    >> the size of the character repertoire in actual use, I trust that it is
    >> relevant (at least initially) to the present topic heading.
    >> The HEBREW PUNCTUATION GERSHAYIM U+05F4 <״> appears in the penultimate
    >> position in a sequence of Hebrew characters that is not to be read as
    >> a word. Since such things as acronyms are regularly used as domain
    >> labels, it thus appears necessary for any registry supporting Hebrew
    >> to include this code point in the corresponding character table. If
    >> so, this is a good example of a situation where "an exception is
    >> appropriate" to the general stricture on "punctuation characters",
    >> stated in the ICANN Guidelines for the Implementation of
    >> Internationalized Domain Names.
    >> The problem is that a standard Hebrew keyboard doesn't include this
    >> character, which is normally replaced by a QUOTATION MARK U+0022.
    >> Anyone entering an IDN including U+05F4 via a keyboard will therefore
    >> be likely to mistype it as U+0022, causing it to fail. It is possible
    >> to get an IDN string containing a quotation mark throughToASCII by
    >> leaving the UseSTD3ASCIIRules flag unset (which is counter to a
    >> "should" point in the ICANN Guidelines). The resulting string contains
    >> a literal quotation mark. Since it is this string that is actually
    >> included in the zone file, the name server will need to load what it
    >> is likely to reject as a malformed name regardless of any IDN
    >> considerations.
    >> Can someone who has detailed understanding of Hebrew orthography
    >> please comment on the necessity of the gershayim in the context
    >> described above. If it cannot comfortable be done without, how can one
    >> offset the confusion that seems inevitable given the alternate
    >> orthography on which the local keyboard is based? Are there other
    >> code points listed as punctuation in the Unicode charts that are
    >> similarly necessary for the IDN support of established orthographic
    >> convention in the languages for which they are used?
    >> /Cary

    This archive was generated by hypermail 2.1.5 : Tue Feb 15 2005 - 17:18:42 CST