From: Shawn Steele (Shawn.Steele@microsoft.com)
Date: Mon Nov 29 2010 - 11:42:37 CST
Minor clarification: Its a choice of the registry/zone. If someone like .com did not allow confusables to be registered, zones under that could still register confusables. Eg: confusable.notconfusable.com
This archive was generated by hypermail 2.1.5
: Mon Nov 29 2010 - 11:46:40 CST
-Shawn
http://blogs.msdn.com/shawnste
________________________________
From: unicode-bounce@unicode.org [unicode-bounce@unicode.org] on behalf of Mark Davis ☕ [mark@macchiato.com]
Sent: Thursday, November 25, 2010 12:45 PM
To: Shriramana Sharma
Cc: UnicoDe List
Subject: Re: Phishing and enforcing Confusables.txt
Whether or not to allow confusables is currently a choice for the registry (such as .com). But if the registry does restrict them, it would be effective anywhere in the world -- for that domain.
Client software can also detect that a particular label has a "whole-script" confusable, based on the confusable data, and at least alert the user.
As you point out below, the confusables are known to be lacking for Indic scripts. If you follow some links in #39, you get to the following page where you can suggest additional confusables, either within the same script or across scripts.
http://unicode.org/draft/reports/tr39/confusables.html
All such volunteer efforts are appreciated, and can help efforts to improve security.
The key issue is to deal with the characters in the scripts listed in Table 5a in http://unicode.org/reports/tr31/#Table_Recommended_Scripts. Any scripts outside of that list are recommended for exclusion anyway. That is, it doesn't matter as much if a character in Telegu looks like a character in Phoenician, because the latter script is recommended for exclusion. What is important are characters in other scripts in Table 5a, plus symbols and punctuation.
To check out the current data, you can use http://unicode.org/cldr/utility/confusables.jsp?a=ಅರಗ
There, we see that the first character does have a confusable in the data, but the others don't.
Mark
— Il meglio è l’inimico del bene —
On Wed, Nov 24, 2010 at 20:57, Shriramana Sharma <samjnaa@gmail.com<mailto:samjnaa@gmail.com>> wrote:
Hello and thanks for all that info. However, the question stands, see below:
On Thu, Nov 25, 2010 at 10:03 AM, CE Whitehead <cewcathar@hotmail.com<mailto:cewcathar@hotmail.com>> wrote:
> "5. In implementing the IDN standards, top-level domain registries should, at least
> initially, limit any given domain label (such as a second-level domain name) to the
> characters associated with one language or set of languages only."
Apart from that "at least initially" stuff, which indicates that it
may change in the future, this really does not solve the problem or
answer the question. I'll forgo the examples i gave previously as they
involved mixed-script text.
Now even *without* mixing scripts, examples can be provided as అరగ.com<http://xn--joc0b6d.com>
(all in Telugu) and ಅರಗ.com<http://xn--6rc0b6d.com> (all in Kannada). What is desired is that
if the Telugu version has been first registered *anywhere in the
world*, the Kannada version should be prohibited from being registered
*everywhere in the world*, or vice versa with the scripts.
Which leads me to note that *somehow*, Confusables.txt is missing a
full-fledged confusables mapping between Kannada and Telugu. Of the
construed example given above, it is obvious that RA and GA are almost
identical between the scripts but Confusables.txt does not list them
at all!
This is a serious lacuna, IMHO, which should be rectified.
Shriramana Sharma.