Re: IDN Security

From: Cary Karp (ck@nic.museum)
Date: Tue Feb 15 2005 - 04:26:07 CST

  • Next message: Peter Kirk: "Re: IDN Security"

    Quoting Mark E. Shoulson:

    > I recognize this is opening a can of worms... but then, it was you that
    > opened it. I'm looking at the idn-chars.html page, and I have a few
    > questions about (naturally) the Hebrew script (since that's one I'm
    > familiar with).

    I have another question about the IDN implementation of the Hebrew
    script. Given that IDN security concerns stand in direct proportion to
    the size of the character repertoire in actual use, I trust that it is
    relevant (at least initially) to the present topic heading.

    The HEBREW PUNCTUATION GERSHAYIM U+05F4 <״> appears in the penultimate
    position in a sequence of Hebrew characters that is not to be read as a
    word. Since such things as acronyms are regularly used as domain labels,
    it thus appears necessary for any registry supporting Hebrew to include
    this code point in the corresponding character table. If so, this is a
    good example of a situation where "an exception is appropriate" to the
    general stricture on "punctuation characters", stated in the ICANN
    Guidelines for the Implementation of Internationalized Domain Names.

    The problem is that a standard Hebrew keyboard doesn't include this
    character, which is normally replaced by a QUOTATION MARK U+0022. Anyone
    entering an IDN including U+05F4 via a keyboard will therefore be likely
    to mistype it as U+0022, causing it to fail. It is possible to get an
    IDN string containing a quotation mark throughToASCII by leaving the
    UseSTD3ASCIIRules flag unset (which is counter to a "should" point in
    the ICANN Guidelines). The resulting string contains a literal quotation
    mark. Since it is this string that is actually included in the zone
    file, the name server will need to load what it is likely to reject as a
    malformed name regardless of any IDN considerations.

    Can someone who has detailed understanding of Hebrew orthography please
    comment on the necessity of the gershayim in the context described
    above. If it cannot comfortable be done without, how can one offset the
    confusion that seems inevitable given the alternate orthography on which
    the local keyboard is based? Are there other code points listed as
    punctuation in the Unicode charts that are similarly necessary for the
    IDN support of established orthographic convention in the languages for
    which they are used?

    /Cary



    This archive was generated by hypermail 2.1.5 : Tue Feb 15 2005 - 04:25:05 CST