L2/07-073R

	M. Suignard, Ed.
	Microsoft Corporation
	M. Davis
	Google
	A. Freytag
	ASMUS Inc.
	March 12, 2007

Working Draft

Preparation of Internationalized Domain Names (idnaprep)

Abstract

This document describes how to prepare internationalized domain name (IDN) labels in order to increase the likelihood that name input and name comparison work in ways that make sense for typical users throughout the world.

This document is an input in the process of defining a new string preparation in the context of International Domain Name. It should not be construed as a competitive initiative to the work represented by "Proposed Issues and Changes for IDNA - An overview" (aka [IDNABis] (Klensin, J., “Proposed Issues and Changes for IDNA - An Overview,” February 2007.)). It is merely a public document representing the view of experts in Unicode technology and implementers of IDN (see the Introduction for more details). It may or may not be used in part in a possible revision of IDN. It uses a format similar to Internet Drafts merely for editing convenience.

This document is supplied purely for informational purposes and publication does not imply any endorsement by the Unicode Consortium. As such, it may be updated, replaced, or superseded by other documents at any time. This is not a stable document; it is inappropriate to cite this document as other than a work in progress.

1. Introduction
    1.1. Terminology
    1.2. Using idnaprep in protocols
2. Preparation Overview
3. Idnaprep character repertoire
4. Mapping of Joiner and Non Joiner characters
5. Normalization
6. Combining Marks
7. Bidirectional Characters
8. idnaprep profiles
9. Security Considerations
    9.1. Idnaprep-specific security considerations
    9.2. Generic Unicode security considerations
10. IANA Considerations
11. Acknowledgements
Appendix A. Unicode database references
Appendix B. Idnaprep Unicode 5.0 profile
Appendix C. Case folding
12. References
    12.1. Normative References
    12.2. Informative References
§ Authors' Addresses
§ Intellectual Property and Copyright Statements

Property name	Description
script	The "script" property is a a string value associated with each character and is determined by "Scripts.txt"
Arabic	Character with "Arabic" "script" value
Right-Joining	Character with "R" Joining Type as specified by "ArabicShaping.txt"
Transparent	Character with "T"Joining Type as specified by "ArabicShaping.txt"
Left-Joining	Character with "L" Joining Type as specified by "ArabicShaping.txt"
Letter	Character with General_Category value of "Lu", "Ll", "Lt", "Lm", or "Lo" as specified in "UnicodeData.txt"
Combining Mark	Character with General_Category value of "Mc" or "Mn" as specified in "UnicodeData.txt"
Virama	Character with Canonical_Combining_Class value equal to "9" as specified in "UnicodeData.txt"
RCat	Character with Bidi_Class value of "R" or "AL" as specified in "UnicodeData.txt"
LCat	Character with Bidi_Class value of "L" as specified in "UnicodeData.txt"
NSMCat	Character with Bidi_Class value of "NSM" as specified in "UnicodeData.txt"
unassigned	Code point with General_Category value of "Cn" in "UnicodeData.txt"

Status field	Description
C	common case folding, common mappings shared by both simple and full mappings.
F	full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces,
S	simple case folding, mappings to single characters where different from F,
T	special cases.

[RFC2119]	Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML).
[UAX15]	Davis, M. and M. Duerst, “Unicode Normalization Forms,” Unicode Standard Annex #15, October 2006.
[UAX9]	Davis, M., “The Bidirectional Algorithm,” Unicode Standard Annex #9, September 2006.
[UCD]	The Unicode Consortium, “Unicode Character Database,” , July 2006.
[Unicode]	The Unicode Consortium, “The Unicode Standard Version 5.0,” Addison-Wesley, Reading, MA , October 2006.

[CharModel]	Whistler, K., Davis, M., and A. Freytag, “Character Encoding Model.,” Unicode Technical Report #17, September 2004.
[Glossary]	The Unicode Consortium, “Unicode Glossary,” Unicode Glossary , September 2006.
[IDNABidi]	Alvestrand, H. and C. Karp, “An IDNA problem in right-to-left scripts,” Internet-Draft , October 2006.
[IDNABis]	Klensin, J., “Proposed Issues and Changes for IDNA - An Overview,” Internet-Draft , February 2007.
[IDNARepertoire]	Falstrom, P., “The Unicode Codepoints and IDN,” Internet-Draft , October 2006.
[ISO10646]	International Organization for Standardization, “Information Technology - Universal Multiple-Octet Coded Character Set (UCS),” ISO Standard 10646-1, with amendments 1 and 2, 2003.
[RFC2434]	Narten, T. and H. Alvestrand, “Guidelines for Writing an IANA Considerations Section in RFCs,” BCP 26, RFC 2434, October 1998 (TXT, HTML, XML).
[RFC3454]	Hoffman, P. and M. Blanchet, “Preparation of Internationalized Strings ("stringprep"),” RFC 3454, December 2002.
[RFC3490]	Faltstrom, P., Hoffman, P., and A. Costello, “Internationalizing Domain Names in Applications (IDNA),” RFC 3490, March 2003.
[RFC3491]	Hoffman, P. and M. Blanchet, “Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN),” RFC 3491, March 2003.
[RFC3987]	Duerst, M. and M. Suignard, “Internationalized Resource Identifiers (IRIs),” RFC 3987, January 2005.
[UTR36]	Davis, M. and M. Suignard, “Unicode Security Considerations,” Unicode Technical Report #36, August 2006.
[UTS39]	Davis, M. and M. Suignard, “Unicode Security Mechanisms,” Unicode Technical Standard #36, August 2006.

	Michel Suignard (editor)
	Microsoft Corporation
	One Microsoft Way
	Redmond, WA 98052
	U.S.A.
Phone:	+1 425 882-8080
Email:	michelsu@microsoft.com
URI:	http://www.suignard.com

	Mark Davis
	Google
	U.S.A.
Email:	mark.davis@macchiato.com or mark.davis@google.com

	Asmus Freytag
	ASMUS Inc.
	U.S.A.
Email:	asmus@unicode.org
URI:	http://home.ix.netcom.com/~asmus-inc/

Working Draft

Preparation of Internationalized Domain Names (idnaprep)

Abstract

Table of Contents

1. Introduction

1.1. Terminology

1.2. Using idnaprep in protocols

2. Preparation Overview

3. Idnaprep character repertoire

4. Mapping of Joiner and Non Joiner characters

5. Normalization

6. Combining Marks

7. Bidirectional Characters

8. idnaprep profiles

9. Security Considerations

9.1. Idnaprep-specific security considerations

9.2. Generic Unicode security considerations

10. IANA Considerations

11. Acknowledgements

Appendix A. Unicode database references

Appendix B. Idnaprep Unicode 5.0 profile

Appendix C. Case folding

12. References

12.1. Normative References

12.2. Informative References

Authors' Addresses

Full Copyright Statement