From: Mark Davis (email@example.com)
Date: Mon Feb 14 2005 - 19:16:04 CST
----- Original Message -----
From: "Mark E. Shoulson" <firstname.lastname@example.org>
To: "Mark Davis" <email@example.com>
Cc: "Unicode Mailing List" <firstname.lastname@example.org>; "UnicoRe Mailing List"
Sent: Monday, February 14, 2005 16:55
Subject: Re: IDN Security
> Mark Davis wrote:
> >3. The UTR had for some time recommended the development of data on
> >confusables, and we will be starting to collect data to test the
> >of different approaches. In regards to that, I'll call people's attention
> >the chart on http://www.unicode.org/reports/tr36/idn-chars.html, that
> >the permissible IDN characters, ordered by script, then whether
> >or not, then according to UCA collation order. (These are characters
> >StringPrep has been performed, so case-folding and normalization have
> >already been applied.)
> I recognize this is opening a can of worms... but then, it was you that
> opened it. I'm looking at the idn-chars.html page, and I have a few
> questions about (naturally) the Hebrew script (since that's one I'm
> familiar with).
Yes, I opened it on purpose! A lot of the email on this topic doesn't take
into account the set of characters that are allowed in IDN, and what effect
different recommendations (eg not mixing scripts) would actually achieve, so
I thought it would be good to get the characters out where people could see
> Why are the YOD-YOD and VAV-YOD and DOUBLE-VAV digraphs considered
> atomic? Typographically they're often realized as two separate
> letters, even in Yiddish. On the other hand, the ALEF-LAMED ligature is
> more likely to deserve consideration as an atomic character (but not
> enough that I'd actually argue for it), and yet it's missing. What gives?
Because they are not decomposed in Unicode:
05F0;HEBREW LIGATURE YIDDISH DOUBLE VAV;Lo;0;R;;;;;N;HEBREW LETTER DOUBLE
05F1;HEBREW LIGATURE YIDDISH VAV YOD;Lo;0;R;;;;;N;HEBREW LETTER VAV YOD;;;;
05F2;HEBREW LIGATURE YIDDISH DOUBLE YOD;Lo;0;R;;;;;N;HEBREW LETTER DOUBLE
> Having all the vowels and accents(!) available, in Hebrew and in Arabic
> as well, is almost certainly overkill (I can't imagine anyone would want
> to complicate a URL so much), but I suppose it's okay for completeness'
That's something that it would be good to get a recommendation on from the
bidi committee. If they would literally never be used in modern Hebrew, then
it would be good to at least alert the user -- especially since at small
sizes they may be hard to distinguish.
> (Braille is an interesting case, since by rights people using Braille
> readers would be registering names in the appropriate scripts, and
> merely representing them with Braille patterns, but again, I suppose
> it's harmless—I can't see anyone actually wanting to use it)
> The dingbats, obviously, are going to be an interesting battleground of
> domain buyers...
The only security issues presented, however, would be where they are
confusable with other characters.
This archive was generated by hypermail 2.1.5 : Mon Feb 14 2005 - 19:17:22 CST