From: Mark Davis ☕ (mark@macchiato.com)
Date: Thu Nov 25 2010 - 14:45:16 CST
Whether or not to allow confusables is currently a choice for the registry
(such as .com). But if the registry does restrict them, it would be
effective anywhere in the world -- for that domain.
Client software can also detect that a particular label has a "whole-script"
confusable, based on the confusable data, and at least alert the user.
As you point out below, the confusables are known to be lacking for Indic
scripts. If you follow some links in #39, you get to the following page
where you can suggest additional confusables, either within the same script
or across scripts.
http://unicode.org/draft/reports/tr39/confusables.html
All such volunteer efforts are appreciated, and can help efforts to improve
security.
The key issue is to deal with the characters in the scripts listed in Table
5a in http://unicode.org/reports/tr31/#Table_Recommended_Scripts. Any
scripts outside of that list are recommended for exclusion anyway. That is,
it doesn't matter as much if a character in Telegu looks like a character
in Phoenician, because the latter script is recommended for exclusion. What
is important are characters in other scripts in Table 5a, plus symbols and
punctuation.
To check out the current data, you can use
There, we see that the first character does have a confusable in the data,
Mark
*— Il meglio è l’inimico del bene —*
On Wed, Nov 24, 2010 at 20:57, Shriramana Sharma <samjnaa@gmail.com> wrote:
> Hello and thanks for all that info. However, the question stands, see
> Which leads me to note that *somehow*, Confusables.txt is missing a
>
This archive was generated by hypermail 2.1.5
: Thu Nov 25 2010 - 14:47:43 CST
http://unicode.org/cldr/utility/confusables.jsp?a=ಅರಗ
but the others don't.
> below:
>
> On Thu, Nov 25, 2010 at 10:03 AM, CE Whitehead <cewcathar@hotmail.com>
> wrote:
> > "5. In implementing the IDN standards, top-level domain registries
> should, at least
> > initially, limit any given domain label (such as a second-level domain
> name) to the
> > characters associated with one language or set of languages only."
>
> Apart from that "at least initially" stuff, which indicates that it
> may change in the future, this really does not solve the problem or
> answer the question. I'll forgo the examples i gave previously as they
> involved mixed-script text.
>
> Now even *without* mixing scripts, examples can be provided as అరగ.com<http://xn--joc0b6d.com>
> (all in Telugu) and ಅರಗ.com <http://xn--6rc0b6d.com> (all in Kannada).
> What is desired is that
> if the Telugu version has been first registered *anywhere in the
> world*, the Kannada version should be prohibited from being registered
> *everywhere in the world*, or vice versa with the scripts.
> full-fledged confusables mapping between Kannada and Telugu. Of the
> construed example given above, it is obvious that RA and GA are almost
> identical between the scripts but Confusables.txt does not list them
> at all!
>
> This is a serious lacuna, IMHO, which should be rectified.
>
> Shriramana Sharma.
>
>
>