Re: Phishing and enforcing Confusables.txt

From: Mark Davis ☕ (mark@macchiato.com)
Date: Thu Nov 25 2010 - 14:45:16 CST

  • Next message: Christopher Fynn: "Re: Best smart phones & apps for diverse scripts?"

    Whether or not to allow confusables is currently a choice for the registry
    (such as .com). But if the registry does restrict them, it would be
    effective anywhere in the world -- for that domain.

    Client software can also detect that a particular label has a "whole-script"
    confusable, based on the confusable data, and at least alert the user.

    As you point out below, the confusables are known to be lacking for Indic
    scripts. If you follow some links in #39, you get to the following page
    where you can suggest additional confusables, either within the same script
    or across scripts.

    http://unicode.org/draft/reports/tr39/confusables.html

    All such volunteer efforts are appreciated, and can help efforts to improve
    security.

    The key issue is to deal with the characters in the scripts listed in Table
    5a in http://unicode.org/reports/tr31/#Table_Recommended_Scripts. Any
    scripts outside of that list are recommended for exclusion anyway. That is,
    it doesn't matter as much if a character in Telegu looks like a character
    in Phoenician, because the latter script is recommended for exclusion. What
    is important are characters in other scripts in Table 5a, plus symbols and
    punctuation.

    To check out the current data, you can use
    http://unicode.org/cldr/utility/confusables.jsp?a=ಅರಗ>

    There, we see that the first character does have a confusable in the data,
    but the others don't.

    Mark

    *— Il meglio è l’inimico del bene —*

    On Wed, Nov 24, 2010 at 20:57, Shriramana Sharma <samjnaa@gmail.com> wrote:

    > Hello and thanks for all that info. However, the question stands, see
    > below:
    >
    > On Thu, Nov 25, 2010 at 10:03 AM, CE Whitehead <cewcathar@hotmail.com>
    > wrote:
    > > "5. In implementing the IDN standards, top-level domain registries
    > should, at least
    > > initially, limit any given domain label (such as a second-level domain
    > name) to the
    > > characters associated with one language or set of languages only."
    >
    > Apart from that "at least initially" stuff, which indicates that it
    > may change in the future, this really does not solve the problem or
    > answer the question. I'll forgo the examples i gave previously as they
    > involved mixed-script text.
    >
    > Now even *without* mixing scripts, examples can be provided as అరగ.com<http://xn--joc0b6d.com>
    > (all in Telugu) and ಅರಗ.com <http://xn--6rc0b6d.com> (all in Kannada).
    > What is desired is that
    > if the Telugu version has been first registered *anywhere in the
    > world*, the Kannada version should be prohibited from being registered
    > *everywhere in the world*, or vice versa with the scripts.

    > Which leads me to note that *somehow*, Confusables.txt is missing a
    > full-fledged confusables mapping between Kannada and Telugu. Of the
    > construed example given above, it is obvious that RA and GA are almost
    > identical between the scripts but Confusables.txt does not list them
    > at all!
    >
    > This is a serious lacuna, IMHO, which should be rectified.
    >

    >
    > Shriramana Sharma.
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Nov 25 2010 - 14:47:43 CST