From: Mark E. Shoulson (
Date: Mon Feb 14 2005 - 18:55:31 CST

    Mark Davis wrote:

    >3. The UTR had for some time recommended the development of data on visually
    >confusables, and we will be starting to collect data to test the feasibility
    >of different approaches. In regards to that, I'll call people's attention to
    >the chart on, that shows
    >the permissible IDN characters, ordered by script, then whether decomposable
    >or not, then according to UCA collation order. (These are characters after
    >StringPrep has been performed, so case-folding and normalization have
    >already been applied.)
    I recognize this is opening a can of worms... but then, it was you that
    opened it. I'm looking at the idn-chars.html page, and I have a few
    questions about (naturally) the Hebrew script (since that's one I'm
    familiar with).

    Why are the YOD-YOD and VAV-YOD and DOUBLE-VAV digraphs considered
    atomic? Typographically they're often realized as two separate
    letters, even in Yiddish. On the other hand, the ALEF-LAMED ligature is
    more likely to deserve consideration as an atomic character (but not
    enough that I'd actually argue for it), and yet it's missing. What gives?

    Having all the vowels and accents(!) available, in Hebrew and in Arabic
    as well, is almost certainly overkill (I can't imagine anyone would want
    to complicate a URL so much), but I suppose it's okay for completeness'

    (Braille is an interesting case, since by rights people using Braille
    readers would be registering names in the appropriate scripts, and
    merely representing them with Braille patterns, but again, I suppose
    it's harmless—I can't see anyone actually wanting to use it)

    The dingbats, obviously, are going to be an interesting battleground of
    domain buyers...


