Document Number:L2/08-427
To: gtld-guide@icann.org
From:  Unicode Technical Committee (UTC)
Date:  2008-11-14
Subject:Comments on Draft Applicant Guidebook
 
The following is approved feedback from the Unicode Technical Committee on the ICANN document "New gTLD Program: Draft Applicant Guidebook (Draft RFP)" (http://icann.org/en/topics/new-gtld-draft-rfp-24oct08-en.pdf). There is a copy of this email on: http://docs.google.com/Doc?id=dfqr8rd5_358ffvqqhf9.

The structure of the feedback below includes a citation of text from the document, suggested replacement text or other changes to remedy the problem, and a rationale for the change.


2.1.1.3.2 String Requirements


The label must be a valid internationalized domain
name, as specified in the technical standard
Internationalizing Domain Names in Applications
(RFC 3490). This includes the following
nonexhaustive list of limitations:
=>
The label must be a valid internationalized domain name, as specified in the latest version of the IDNA specifications (see XXX). This includes, but is not limited to, the following constraints. Note that these are in no way a complete statement of the requirements of the IDNA specifications.

Rationale. Clearer wording, and you *really* don't want the reader to think that what is listed here is in any way completely whatsoever.



- Must consist entirely of characters
directional property.
[DELETE]

Rationale. This is completely false. It would disallow many IDNs that are needed, and allowed by idna-bis-bidi. Note: it is questionable how much of IDNA2008 this text should repeat, especially in the case of complex provisions like BIDI. Moreover, "directional property" is undefined.



All code points in a single label must be taken
from the same script as determined by the
Unicode Standard Annex #24: Unicode Script
Property.

=>

Labels are subject to a constraint based on the script value of their characters. All characters in the label that do not have the Common script value or the Inherited script value must share a single script value. Script values are determined as specified in the Unicode Standard: see  Unicode Standard Annex #24: Unicode Script Property.

Rationale. The constraint to single scripts is far too narrow. The script values Common and Inherited are given to characters that are used with multiple scripts, such as "-" or "2", or Arabic vowels. Forcing such obvious characters to go through the exception process is needless overhead, and obscures the exceptional cases.


2.1.1.4.1 Requirements for Strings Intended to Represent Geographical Entities


This includes a representation of the
country or territory name in any of the six official
United Nations languages (French, Spanish,
Chinese, Arabic, Russian and English) and the
country or territory’s local language.

=>

This includes a representation of the country or territory name in any of the six official United Nations languages (French, Spanish, Chinese, Arabic, Russian and English) and any of the country or territory’s local languages.

Rationale. It is quite common for a country or territory to have more than one language, so that needs to be accounted for.



Applications for any string that represents a subnational
place name, such as a county, province,
or state, listed in the ISO 3166-2 standard.

=>

Applications for any string that represents a subnational place name, such as a county, province, or state. These could be, for example, as listed in the ISO 3166-2 standard.

Rationale. The ISO 3166-2 standard is not complete, and is not freely available. Including the comma may imply to the reader that it is required, that the sentence is to be read as: "Applications for any string that represents a subnational place name (such as a county, province, or state) listed in the ISO 3166-2 standard."



Applications for a city name, where the applicant
clearly intends to use the gTLD to leverage from the
city name.

Issue. City names are *very* ambiguous - look at the number of "Paris" cities that exist. If Paris, Texas gets there first, what happens? Should there be some qualification necessary to disambiguate city names instead?



1.3 Information for Internationalized Domain Name Applicants

If an applicant applies for such a string, it must provide
accompanying information indicating compliance with
the IDNA protocol and other requirements. The IDNA
protocol is currently under revision and its documentation
can be found at
http://www.icann.org/en/topics/idn/rfcs.htm.

[ADD AFTERWARDS]

This document presumes that the IDNA protocol has been revised in accordance with the description at http://www.icann.org/en/topics/idn/rfcs.htm, and makes use of terminology defined in the draft revisions. That revision may change before approval, and such changes could require corresponding modifications of the following text.

Rationale. It must be made clear to the reader that while we expect the revision to succeed, the text following this in the document is subject to change.



2. Language of label (ISO 639-1). The applicant will
specify the language of the applied-for TLD string, both
Module 1 Introduction to the gTLD Application Process Draft – For Discussion Only
1-17 according to the ISO’s codes for the representation of
names of languages, and in English.

=>

Language tag of label (according to IETF BCP 47 Tags for Identifying Languages). The applicant will specify the language tab of the applied-for TLD string, both Module 1 Introduction to the gTLD Application Process Draft – For Discussion Only 1-17 according to the IETF BCP 47 Tags for Identifying Languages, and in English.

Rationale: ISO 639-1 only covers a small fraction of the world's languages. The correct reference, used in HTML, XML, and all modern software, is BCP 47.



3. Script of label (ISO 15924).The applicant will specify the
script of the applied-for gTLD string, both according to
the ISO code for the presentation of names of scripts,
and in English.

=>

Main script of label (see 2.1.1.3.2 String Requirements). The applicant will specify the scripts of the applied-for gTLD string, both according to the Unicode Script property, and in English.

Rationale. This brings the text in line with the use of script in 2.1.1.3.2 String Requirements. It also prevents bogus information such as script variants (Latin Fraktur), which are not properties of characters. The term "scripts" takes account of the fact that some cases of multiple scripts are allowed. (Note that this information is competely derivable from the U-Label.)



4. Unicode code points. The applicant will list all the code
points contained in the U-label according to its
Unicode form.

=>

4. Unicode code points. The applicant will list all the codepoints contained in the U-label according using the U+ notation. For example, for the label "öbb", the list would be: "U+00F6 U+0062 U+0062".

Rationale. This makes the intent clear. 


5. Representation of label in phonetic alphabet. The
applicant will provide its applied-for gTLD string notated
according to the International Phonetic Alphabet
(http://www.arts.gla.ac.uk/IPA/ipachart.html ).

[DELETE]

Rationale. First, it is questionable what the purpose of this is -- how is it to be used? How would it make a difference in the registration what the IPA was? Secondly, the same word could have many different IPA readings, narrow vs broad, or vary greatly by speaker (the same word spoken by a Scot vs a Chicagoan). Third, very few registrants will be able to supply correct IPA representations.



6. Its IDN table. This table provides the list of characters
eligible for registration in domain names according to
registry policy. It will contain any multiple characters
that can be considered “the same” for the purposes of
registrations at the second level. For examples, see
http://iana.org/domains/idn-tables/.

Question: we think this means a reference to a table rather than a complete copy. If so, what format should such a reference take, is a link sufficient? It should be clear exactly what a registrant needs to supply.



7. Applicants must further demonstrate that they have
made reasonable efforts to ensure that the encoded
IDN string does not cause any rendering or operational
problems. For example, problems have been identified
in strings with characters of mixed right-to-left and leftto-
right directionality when numerals are adjacent to
the path separator. If an applicant were applying for a
string with known issues, it should document steps that
will be taken to mitigate these issues in applications.

Question. It sounds like this is asking the applicant to change all the program applications that use the domain name, which is clearly impossible. What would be an example of "reasonable efforts"?


2.1.1.1 String Confusion Review

...
The similarity review will be conducted by a panel of String 
Similarity Examiners. This examination will be informed by an 
algorithmic score for the visual similarity between each 
applied-for string and each of other existing and applied- 
for TLDs. The score will provide one objective measure for 
consideration by the panel.
...
The algorithm uses proprietary software to perform a series of mathematical calculations to assess the visual similarity between strings based upon the following parameters:
...

Issue. It is inappropriate for ICANN to use an algorithm which is not public, and not based on public data.



If the evaluators determine that a string poses stability
issues that require further investigation, the applicant must
either confirm that it intends to move forward with the
application process or withdraw its application.

Issue. What is an example of "stability issues" in a string? Should this be "technical issue"? How is an applicant supposed to know what "stability issue" means. All terms needs definition, and either before usage or in a glossary. Currently there is a definition of stability of a "registry service", is later, at the end of 2.1.3, but no definition or indication of what "stability issues" are for string?