L2/00-267 Internet Draft Deuk-kul Jang draft-dkjang-idn-01.txt So-myung Ind August 8, 2000 Expires in six months Internationalized domain names divided by characters key Status of this memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This description addresses the method for using internationalized (multilingual) domain names under the current DNS. Further, the method for converting a internationalized domain name expressed in a native language, into a traditional US-ASCII domain name compatible in current DNS, is addressed. This method especially uses 'sequence of 36 characters' to convert all characters in IDN into US-ASCII. Finally, the way to use this method with lc2LDs are presented. Contents 1. Introduction 1.1. Definitions and Conventions 1.2. Summary 2. Multilingual key 3. Language key 4. Sequence of 36 characters 5. Composition of Character substitute 5.1. In case the number of characters is below 36; 5.1.1. When the number of characters is below 9; 5.1.2. When the number of characters is above 9 and below 26; Expires 9th of Feb 2001 [Page 1] Internet Draft IDN divided by characters key August 8, 2000 5.1.3. When the number of characters is above 26 and below 35; 5.2. In case the number of characters is above 36 characters; 5.2.1. When the number of characters is above 35 characters and below 1260 characters; 5.2.2. When the number of characters is above 1260 characters and below 1296 characters; 5.2.3. When the number of characters is above 1296 characters and below 45,360 characters; 5.2.4. When the number of characters is above 45,360 characters and below 47,952 characters; 5.3. Characters in the plane besides BMP 6. TLD (Top level domain) 7. Conversion and display 7.1 Converting IDN into the traditional name 7.2 Display of IDN 8. Foreign language 9. Creating 'lc2LD's for IDN 10. References 11. Patent information 12. Author's address 1. Introduction Under the current DNS (domain name system), the IP address (which is a combination of numbers) and the domain names are used. The purpose of the domain name is to use more familiar and memorable names than the IP address. Nevertheless, because of the restriction of using only US-ASCII characters in domain names, and although some persons don't speak English, they have to use unfamiliar English domain names. For them, it may not be much different from the IP address. As a result, it is difficult to find home pages of even famous companies without knowing their English domain names in advance. The top level domains are designated in English for international recognition. As for the second level domain under ccTLD, we have also used English letters by using abbreviated English words which almost seem to be secret codes (for example, ac, co, go, or, etc.). We have to write as a 2LD of Seoul Korea '.seoul.kr' instead of Korean. Furthermore, in order to control computers and use the Internet with voice orders in the future, Internationalized domain names are indispensable. But, In order to be real international, IDN have to be expressed with English for foreigners who do not know used language. 1.1. Definitions and Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", Expires 9th of Feb 2001 [Page 2] Internet Draft IDN divided by characters key August 8, 2000 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. A 'internationalized domain name'(IDN) means the domain name expressed in the native language (non-ASCII) in the user's interface. A 'traditional domain name' means the domain name compatible with the current DNS. A 'converted domain name' is the same as the traditional domain name but it is converted from an IDN to be compatible with the current DNS. A 'character substitute' is a 'string of ASCII characters' that replaces a native character in IDN when the IDN is converted into traditional ones. A 'multilingual key' is a 'character assembly' located at a specific position of the converted domain name and represents that it is the converted domain name from an IDN. A 'language key' represents the original language from which the domain has been converted into the converted domain names. "DNS" ; Domain name System "IDN" ; Internationalized Domain Name (written in native language) "gTLD" ; generic Top Level Domain "lc2LD" ; language code 2nd Level Domain "BMP" ; Basic Multilingual Plane 1-2. Summary According to [RFC1034], domain names must start with a letter (a through z), end with a letter or digit (0 through 9), and have only letters, digits, and hyphen as interior characters in the current DNS. Therefore native characters in an IDN must be converted into US-ASCII characters in order to be compatible with the current DNS. Hence, 'character substitutes' are made to express native characters (in IDN) with US-ASCII characters. To distinguish converted domain names from traditional ones, a 'multilingual key' is defined and added to every converted domain name. And to represent the original language from which the name has been converted, the 'language key' is assigned for every language. In this system, when an IDN is converted to the traditional domain name; Expires 9th of Feb 2001 [Page 3] Internet Draft IDN divided by characters key August 8, 2000 1) The 'multilingual key' will be included automatically in All converted domain names. 2) The 'language key' will be included automatically in the converted domain name. 3) All characters will be replaced by 'character substitutes'. The conversion from an IDN to a traditional name, or the reverse conversion from a converted name to the original IDN, is performed by the conversion program installed on the user's computer. The conversion program will run when the user inputs the domain name including his/her own language and begins to use the internet services. For the display on the monitor, by the multilingual key included in the converted domain name, the program will be run. Therefore, users can use all the internet services in their own language at their convenience. 2. Multilingual key By placing 'a certain specific ASCII characters' in a certain 'specific location' in the domain name, a "multilingual key", that represents a converted domain name from an IDN, can be made with this 'specific characters in the specific location'. When a domain includes this 'multilingual key' at the specific location, the 'multilingual key' indicates that the domain name was converted from IDN. When a user inputs a IDN that includes his/her native language in order to log in, the system adds this multilingual key with a language key first and converts it into a traditional domain name. For display, the system checks whether the multilingual key is contained. If so, according to the language key, the system converts that domain name to an IDN. Note. To avoid confusion, the multilingual key should be characters that is not commonly used. Further, it is recommended that the domain name, containing the same character(s) in the same location, not be registered as the traditional domain name. In section 9. below, this multilingual key is replaced by a new gTLD. 3. Language key One or two US-ASCII characters are assigned as a 'language key' for every language currently defined (and being added) by [ISO 10646]. This language key follows the 'multilingual key' in the converted domain names and represents the kind of language from which the name has been converted. Expires 9th of Feb 2001 [Page 4] Internet Draft IDN divided by characters key August 8, 2000 By separating domains according to language, multilingual characters can be expressed in minimum numbers of ASCII characters. If the language key is not for the language that user have selected as his main or subsidiary language, but for other languages, the system shall not convert and display it to the monitor as US-ASCII characters. As a language key, if two ASCII characters are given for every language (like 'ko' for Korean, 'ja' for Japanese), we can manage 1,296 languages (=36x36) theoretically. In section 9. below, this language key is replaced by lc2LD. 4. Sequence of 36 characters In order to substitute all characters (used in IDNs) by the least number of US-ASCII when the IDNs are converted into traditional names, the 'sequence of 36 characters' is made with letters (a-z) and digits (0-9). They 36 characters may be used in the middle or at the end of the domain name. 5. Composition of Character substitute Character substitute consists of the following: a. Separate characters currently defined (and being added) by [ISO10646] according to the kind of language. b. For each language, all of native characters as well as alphabets and digits are arranged in the 'sequence of 36 characters'. c. Set the arranged 'sequence of 36 characters' as the 'character substitute' for the character. According to the entire numbers of characters of one language, assign all characters to 'sequence of 36 characters' as follows: 5.1. In case the number of characters is below 36; 5.1.1. When the number of characters is below 9; 5.1.1.1. Assign multilingual characters 1-9, and digits (0-9) to 00-09 of the 'sequence of 36 characters'. The alphabets (US-ASCII, a-z) and hyphen are used as they are. 5.1.1.2. As an alternative to 5.1.1.1. this arrangement may be used: Assign multilingual characters to the 01-09 of the 'sequence of Expires 9th of Feb 2001 [Page 5] Internet Draft IDN divided by characters key August 8, 2000 36 characters' and the digit '0 (zero)' to '00'. digits (1-9) and the alphabets (US-ASCII) and hyphen are used as they are. (Example) In German there are 4 characters which are distinguished from english alphabets (a-z) case-insensitively. Because those 4 can be managed as multilingual characters, German characters correspond to this case 5.1.1. Therefore those 4 characters assigned to 01-04 of the `sequence of 36characters'. But in this case It seems better to assign them to '0a' '0o' 0u' 0b' (as if section 5.1.2.2) instead of 01-04. And assign the digit '0(zero)' to '00', The rest of digits (1-9) and the alphabets (a-z) and hyphen are used as they are. German domain name under gTLD (.com), (in hexadecimal) "0x0067/0x0072/0x00fc/0x006e/.com", will be converted following. The multilingual key is 'z-' at the position of name part, and the language key for German is 'de'. German character(u-umlaut) 0x00fc===> 0u 3 alphabets are used as they are. 0x0067===> g 0x0072===> r 0x006e===> n The name of IDN in German under user interface, "0x0067/0x0072/0x00fc/0x006e/.com", is converted to the traditional name following. (1) The multilingual key 'z-' is added to name part first. (2) The language key 'de' follows The multilingual key. (3) Character substitutes and '.com' follow them. converted domain name : z-degr0un.com Expires 9th of Feb 2001 [Page 6] Internet Draft IDN divided by characters key August 8, 2000 5.1.2. When the number of characters is above 9 and below 26; 5.1.2.1. Assign multilingual characters to a-z, and the alphabets (US-ASCII) to 0a-0z, digit '0' to 00 and rests of digits (1-9) and hyphen are used as they are. 5.1.2.2. As an alternative to 5.1.2.1 this arrangement may be used; Assign multilingual characters to 0a-0z, digit '0' to 00 and digits of 1-9 and the alphabets (US-ASCII) and hyphen are used as they are. 5.1.3. When the number of characters is above 26 and below 35; 5.1.3.1. Assign multilingual characters to 1-z in the order, and digits (0-9) and alphabets (a-z) to 00-0z. And hyphen is used as it is. 5.1.3.2. As an alternative to 5.1.3.1 this arrangement may be used; Assign multilingual characters to 01-0z, digit '0' to 00 and digits of 1-9 and alphabets (a-z) and hyphen are used as they are. 5.2. In case the number of characters is above 36 characters; 5.2.1. When the number of characters is above 35 characters and below 1260 characters; Assign multilingual characters to 10-zz (35x36=1260) of the 'sequence of 36 characters' in order. Assign digits (0-9) and alphabets (a-z) to 00-0z of the 'sequence of 36 characters'. Hyphen is used as it is. 5.2.2. When the number of characters is above 1260 characters and below 1296 characters; By attaching the letters using a hyphen from '-0' to '-z' to the end of the sequence of 'zz', the representation range of two ASCII characters is extended to 1296 multilingual characters. In this case hyphen is assigned to '0-'. 5.2.3. When the number of characters is above 1296 characters and below 45,360 characters; Assign digits (0-9) and alphabets (a-z) to 00-0z of the 'sequence of 36 characters'. A hyphen is used as it is. Assign multilingual characters to the three digits of '36 characters sequence', 100-zzz, in order. Expires 9th of Feb 2001 [Page 7] Internet Draft IDN divided by characters key August 8, 2000 (Example) In Korea, they use Korean characters together with chinese characters in writing. In other words, for Korean, multilingual characters are composed of Korean characters (11,172) and Chinese characters (about 21,000). The language of around 32,000 characters corresponds to this section 5.2.3. So digits (0-9) and alphabets (a-z) are assigned to 00-09 and 0a-0z of the 36-characters sequence. And the 11,172 Korean characters at position 0xac00-d7a3 (hexadecimal) in the [ISO10646] are arranged in 100-9mb of the 36-characters sequence in the order, and chinese characters follow them. A multilingual domain name composed with 4 Korean characters, 2 alphabets (ks), hyphen (-) and 2 digits (23) under gTlD (.com), "0xb300/0xd55c/0xb9c7/0xad09/ks-23.0xd68c/0xc0ac", will be converted following. "0xb300/0xd55c/0xbbfc/0xad5d" means Korea. And "0xd68c/0xc0ac" means company. The multilingual key is 'z-' at the position of name part, and the language key for Korean is 'ko'. digits 2 ===> 02 3 ===> 03 alphabets and hyphen k ===> 0k s ===> 0s hyphen - ===> - Korean 0xb300 (the 1,793th from the 0xac00) ===> 2ds (the 1,793th from the '100' of the 36 sequence) 0xd55c (the 10,589th) ===> 964 0xbbfc (the 4,093th) ===> 45y 0xad5d (the 366th) ===> 1a5 direct translation (see section 6.) 0xd68c/0xc0ab ===> .com The IDN under user interface, Expires 9th of Feb 2001 [Page 8] Internet Draft IDN divided by characters key August 8, 2000 "0xb300/0xd55c/0xb9c7/0xad09/ks-23.0xd68c/0xc0ab", is converted to the traditional name following. (1) The multilingual key 'z-' is added to name part first. (2) The language key 'ko' follows The multilingual key. (3) Character substitutes and '.com' follow them. converted domain name : z-kr2ds96445y1a50k0s-0203.com 5.2.4. When the number of characters is above 45,360 characters and below 47,952 characters; As stated above in 5.2.2.2, the three ASCII characters using a hyphen, can extend the representation range to 47,952 (36x37x36) multilingual characters. In this case hyphen is assigned to '0-'. 5.3. Characters in the plane besides BMP Same as characters in BMP, those characters in other plane of Canonical form are divided according to the kind of language and arranged independently in '36-characters sequence' with alphabets and digits. Therefore, which plane those characters are located in makes no difference. Table. Number of US-ASCII characters needed to represent one multilingual character (alt=alternative arrangement) ----------------------------------------------------------------------- Kind of Number of Characters Character 0-9(alt) 10-26(alt) 27-35(alt) 36-1296 1297-47952 ----------------------------------------------------------------------- Native 1 (2) 1 (2) 1 (2) 2 3 Alphabets 1 (1) 2 (1) 2 (1) 2 2 Digit 2 (1) 1 (1) 2 (1) 2 2 ----------------------------------------------------------------------- 6. TLD (Top level domain) The top level domains are limited in numbers (.com, .org, .net etc.). Thus, instead of a substitute, they are directly translated. 7. Conversion and display Expires 9th of Jan 2001 [Page 9] Internet Draft IDN divided by characters key August 8, 2000 7.1. Converting IDN into traditional name. When a user enters an IDN into an application to use an Internet service, the conversion program runs by multilingual character(s) included in the name. Then, the program converts the IDN into the traditional name by including the 'multilingual key' and the 'language key' mentioned above, and by replacing each character with its 'character substitute', and hands over the converted domain name to the application handling the Internet service. 7.2 Display of IDN When a domain name includes the multilingual key, and the language key in that name conforms to the language selected as main or subsidiary language, the program converts, (reverse of 7.1), by deleting the multilingual key and language key, and by replacing the rest of the ASCII characters with native characters. And then the IDN is displayed to users monitor in the native language. But if the domain name does not contain a multilingual key, or the language key does not conform to the language selected, the domain name is displayed to the monitor as it is US-ASCII without any conversion. In other words, traditional US-ASCII names and foreign IDNs are displayed in English, and the IDNs that belongs to user's native language are displayed in User's language. 8. Foreign language When a user logs in another IDN in a different language zone, (e.g., Japanese user tries to log in the Korean domain); If the user does not have a text editor for its language, he/she types and logs in the domain name as it is US-ASCII. 9. creating lc2LD for IDN 1) creating lc2LDs under current gTLDs Two key, that is multilingual key and language key, can be replaced by one lc2LD (language code 2nd Level Domain), like .ko.com for korean under '.com', '.ja.net' for japanese under .net. Then, the examples above will be encoded to; 0x0067/0x0072/0x00fc/0x006e/.com in section 5.1.1.2. ===>'gr0un.de.com' instead of 'z-degr0un.com' '0xb300/0xd55c/0xbbfc/0xad5d/.0xd68c/0xc0ac/' in section 5.2.3. ===>'2ds96445y1a5.ko.com' instead of 'z-kr2ds96445y1a5.com'. Expires 9th of Jan 2001 [Page 10] Internet Draft IDN divided by characters key August 8, 2000 2) When it is impossible for every language to find suitable strings of characters under current gTLDs, And if new gTLDs Only for IDN, such as '.icom' (or .ico), 'inet' (or .ine), can be created, then new gTLDs will replace the 'multilingual key'. In this case the examples above will be encoded to; 0x0067/0x0072/0x00fc/0x006e/.com in section 5.1.1.2. ===>'degr0un.icom' instead of 'z-degr0un.com' '0xb300/0xd55c/0xbbfc/0xad5d/.0xd68c/0xc0ac/' in section 5.2.3. ===>'kr2ds96445y1a5.icom' instead of 'z-kr2ds96445y1a5.com'. 3) lc2LD under new gTLD The lc2LDs under new gTLDs can replace language key. Then the examples will be encoded respectively to 'gr0un.de.icom' and '2ds96445y1a5.kr.icom' 10. References RFC1034 P. Mockapetris "DOMAIN NAMES - CONCEPTS AND FACILITIES" November 1987 11. Patent information The most part of this method has been applied for a patent in Korea. Application Date: February 12, 2000 Application No.: 10-2000-0006723 Applicant: Deuk-kul Jang (4-1995-085521-2) 12. Author's address Deuk-kul Jang So-myung Ind. Postal address: Kyunggido namyangjushi jingunmyun songnungri 178-6 Republic of Korea Telephone number; 502-3030-308, 17-266-3030 Fax. Number ; 31-573-6849 E-mail ; dkjang@smind.co.kr Expires 9th of Feb 2001 [Page 11]