RE: Phishing and enforcing Confusables.txt

From: Shawn Steele (Shawn.Steele@microsoft.com)
Date: Mon Nov 29 2010 - 11:42:37 CST

  • Next message: Mark Davis ☕: "Re: Phishing and enforcing Confusables.txt"

    Minor clarification: Its a choice of the registry/zone. If someone like .com did not allow confusables to be registered, zones under that could still register confusables. Eg: confusable.notconfusable.com


    -Shawn

     
    http://blogs.msdn.com/shawnste

    ________________________________
    From: unicode-bounce@unicode.org [unicode-bounce@unicode.org] on behalf of Mark Davis ☕ [mark@macchiato.com]
    Sent: Thursday, November 25, 2010 12:45 PM
    To: Shriramana Sharma
    Cc: UnicoDe List
    Subject: Re: Phishing and enforcing Confusables.txt

    Whether or not to allow confusables is currently a choice for the registry (such as .com). But if the registry does restrict them, it would be effective anywhere in the world -- for that domain.

    Client software can also detect that a particular label has a "whole-script" confusable, based on the confusable data, and at least alert the user.

    As you point out below, the confusables are known to be lacking for Indic scripts. If you follow some links in #39, you get to the following page where you can suggest additional confusables, either within the same script or across scripts.

    http://unicode.org/draft/reports/tr39/confusables.html

    All such volunteer efforts are appreciated, and can help efforts to improve security.

    The key issue is to deal with the characters in the scripts listed in Table 5a in http://unicode.org/reports/tr31/#Table_Recommended_Scripts. Any scripts outside of that list are recommended for exclusion anyway. That is, it doesn't matter as much if a character in Telegu looks like a character in Phoenician, because the latter script is recommended for exclusion. What is important are characters in other scripts in Table 5a, plus symbols and punctuation.

    To check out the current data, you can use http://unicode.org/cldr/utility/confusables.jsp?a=ಅರಗ>

    There, we see that the first character does have a confusable in the data, but the others don't.

    Mark

    — Il meglio è l’inimico del bene —


    On Wed, Nov 24, 2010 at 20:57, Shriramana Sharma <
    samjnaa@gmail.com<mailto:samjnaa@gmail.com>> wrote:
    Hello and thanks for all that info. However, the question stands, see below:

    On Thu, Nov 25, 2010 at 10:03 AM, CE Whitehead <cewcathar@hotmail.com<mailto:cewcathar@hotmail.com>> wrote:
    > "5. In implementing the IDN standards, top-level domain registries should, at least
    > initially, limit any given domain label (such as a second-level domain name) to the
    > characters associated with one language or set of languages only."

    Apart from that "at least initially" stuff, which indicates that it
    may change in the future, this really does not solve the problem or
    answer the question. I'll forgo the examples i gave previously as they
    involved mixed-script text.

    Now even *without* mixing scripts, examples can be provided as అరగ.com<http://xn--joc0b6d.com>
    (all in Telugu) and ಅರಗ.com<http://xn--6rc0b6d.com> (all in Kannada). What is desired is that
    if the Telugu version has been first registered *anywhere in the
    world*, the Kannada version should be prohibited from being registered
    *everywhere in the world*, or vice versa with the scripts.


    Which leads me to note that *somehow*, Confusables.txt is missing a
    full-fledged confusables mapping between Kannada and Telugu. Of the
    construed example given above, it is obvious that RA and GA are almost
    identical between the scripts but Confusables.txt does not list them
    at all!

    This is a serious lacuna, IMHO, which should be rectified.




    Shriramana Sharma.





    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2010 - 11:46:40 CST