Re: IDN Security

From: Mark Davis (mark.davis@jtcsv.com)
Date: Mon Feb 14 2005 - 19:16:04 CST

  • Next message: Clark Cox: "Re: IDN Security"

    ‎Mark

    ----- Original Message -----
    From: "Mark E. Shoulson" <mark@kli.org>
    To: "Mark Davis" <mark.davis@jtcsv.com>
    Cc: "Unicode Mailing List" <unicode@unicode.org>; "UnicoRe Mailing List"
    <unicore@unicode.org>
    Sent: Monday, February 14, 2005 16:55
    Subject: Re: IDN Security

    > Mark Davis wrote:
    >
    > >3. The UTR had for some time recommended the development of data on
    visually
    > >confusables, and we will be starting to collect data to test the
    feasibility
    > >of different approaches. In regards to that, I'll call people's attention
    to
    > >the chart on http://www.unicode.org/reports/tr36/idn-chars.html, that
    shows
    > >the permissible IDN characters, ordered by script, then whether
    decomposable
    > >or not, then according to UCA collation order. (These are characters
    after
    > >StringPrep has been performed, so case-folding and normalization have
    > >already been applied.)
    > >
    > >
    > I recognize this is opening a can of worms... but then, it was you that
    > opened it. I'm looking at the idn-chars.html page, and I have a few
    > questions about (naturally) the Hebrew script (since that's one I'm
    > familiar with).

    Yes, I opened it on purpose! A lot of the email on this topic doesn't take
    into account the set of characters that are allowed in IDN, and what effect
    different recommendations (eg not mixing scripts) would actually achieve, so
    I thought it would be good to get the characters out where people could see
    them.

    >
    > Why are the YOD-YOD and VAV-YOD and DOUBLE-VAV digraphs considered
    > atomic? Typographically they're often realized as two separate
    > letters, even in Yiddish. On the other hand, the ALEF-LAMED ligature is
    > more likely to deserve consideration as an atomic character (but not
    > enough that I'd actually argue for it), and yet it's missing. What gives?

    Because they are not decomposed in Unicode:
    05F0;HEBREW LIGATURE YIDDISH DOUBLE VAV;Lo;0;R;;;;;N;HEBREW LETTER DOUBLE
    VAV;;;;
    05F1;HEBREW LIGATURE YIDDISH VAV YOD;Lo;0;R;;;;;N;HEBREW LETTER VAV YOD;;;;
    05F2;HEBREW LIGATURE YIDDISH DOUBLE YOD;Lo;0;R;;;;;N;HEBREW LETTER DOUBLE
    YOD;;;;

    > Having all the vowels and accents(!) available, in Hebrew and in Arabic
    > as well, is almost certainly overkill (I can't imagine anyone would want
    > to complicate a URL so much), but I suppose it's okay for completeness'
    > sake.

    That's something that it would be good to get a recommendation on from the
    bidi committee. If they would literally never be used in modern Hebrew, then
    it would be good to at least alert the user -- especially since at small
    sizes they may be hard to distinguish.

    >
    > (Braille is an interesting case, since by rights people using Braille
    > readers would be registering names in the appropriate scripts, and
    > merely representing them with Braille patterns, but again, I suppose
    > it's harmless—I can't see anyone actually wanting to use it)
    >
    > The dingbats, obviously, are going to be an interesting battleground of
    > domain buyers...

    The only security issues presented, however, would be where they are
    confusable with other characters.

    >
    > ~mark
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Feb 14 2005 - 19:17:22 CST