Re: Phishing and enforcing Confusables.txt

From: Mark Davis ☕ (mark@macchiato.com)
Date: Mon Nov 29 2010 - 11:54:03 CST

  • Next message: Mahesh T. Pai: "Re: Phishing and enforcing Confusables.txt"

    True; a point well worth emphasizing.

    By "registry" I mean at any level. So just as .com regulates everything of
    the form xxx.bom, the entity responsible for .blogspot.com controls
    everything of the form xxx.blogspot.com. Thus there are literally millions
    of registries.

    Mark

    *— Il meglio è l’inimico del bene —*

    On Mon, Nov 29, 2010 at 09:42, Shawn Steele <Shawn.Steele@microsoft.com>wrote:

    > Minor clarification: Its a choice of the registry/zone. If someone like
    > .com did not allow confusables to be registered, zones under that could
    > still register confusables. Eg: confusable.notconfusable.com
    >
    >
    >
    > -Shawn
    >
    >
    >
    >  
    >
    > http://blogs.msdn.com/shawnste
    >
    >
    > ------------------------------
    > *From:* unicode-bounce@unicode.org [unicode-bounce@unicode.org] on behalf
    > of Mark Davis ☕ [mark@macchiato.com]
    > *Sent:* Thursday, November 25, 2010 12:45 PM
    > *To:* Shriramana Sharma
    > *Cc:* UnicoDe List
    >
    > *Subject:* Re: Phishing and enforcing Confusables.txt
    >
    > Whether or not to allow confusables is currently a choice for the
    > registry (such as .com). But if the registry does restrict them, it would be
    > effective anywhere in the world -- for that domain.
    >
    > Client software can also detect that a particular label has a
    > "whole-script" confusable, based on the confusable data, and at least alert
    > the user.
    >
    > As you point out below, the confusables are known to be lacking for Indic
    > scripts. If you follow some links in #39, you get to the following page
    > where you can suggest additional confusables, either within the same script
    > or across scripts.
    >
    > http://unicode.org/draft/reports/tr39/confusables.html
    >
    > All such volunteer efforts are appreciated, and can help efforts to
    > improve security.
    >
    > The key issue is to deal with the characters in the scripts listed in
    > Table 5a in http://unicode.org/reports/tr31/#Table_Recommended_Scripts.
    > Any scripts outside of that list are recommended for exclusion anyway. That
    > is, it doesn't matter as much if a character in Telegu looks like a
    > character in Phoenician, because the latter script is recommended for
    > exclusion. What is important are characters in other scripts in Table 5a,
    > plus symbols and punctuation.
    >
    > To check out the current data, you can use
    > http://unicode.org/cldr/utility/confusables.jsp?a=ಅರಗ>
    >
    > There, we see that the first character does have a confusable in the
    > data, but the others don't.
    >
    > Mark
    >
    > *— Il meglio è l’inimico del bene —*
    >
    >
    > On Wed, Nov 24, 2010 at 20:57, Shriramana Sharma <
    samjnaa@gmail.com>wrote:
    >
    >> Hello and thanks for all that info. However, the question stands, see
    >> below:
    >>
    >> On Thu, Nov 25, 2010 at 10:03 AM, CE Whitehead <cewcathar@hotmail.com>
    >> wrote:
    >> > "5. In implementing the IDN standards, top-level domain registries
    >> should, at least
    >> > initially, limit any given domain label (such as a second-level domain
    >> name) to the
    >> > characters associated with one language or set of languages only."
    >>
    >> Apart from that "at least initially" stuff, which indicates that it
    >> may change in the future, this really does not solve the problem or
    >> answer the question. I'll forgo the examples i gave previously as they
    >> involved mixed-script text.
    >>
    >> Now even *without* mixing scripts, examples can be provided as అరగ.com<http://xn--joc0b6d.com>
    >> (all in Telugu) and ಅರಗ.com <http://xn--6rc0b6d.com> (all in Kannada).
    >> What is desired is that
    >> if the Telugu version has been first registered *anywhere in the
    >> world*, the Kannada version should be prohibited from being registered
    >> *everywhere in the world*, or vice versa with the scripts.
    >
    >
    >
    >> Which leads me to note that *somehow*, Confusables.txt is missing a
    >> full-fledged confusables mapping between Kannada and Telugu. Of the
    >> construed example given above, it is obvious that RA and GA are almost
    >> identical between the scripts but Confusables.txt does not list them
    >> at all!
    >>
    >> This is a serious lacuna, IMHO, which should be rectified.
    >>
    >
    >
    >
    >
    >>
    >> Shriramana Sharma.
    >>
    >>
    >>
    >



    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2010 - 11:57:13 CST