Proposal for LATIN A WITH DOT ABOVE

ISO
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 2/WG 2

Universal Multiple-Octet Coded Character Set
(U C S)

ISO/IEC JTC1/SC2/WG2 N1838
Date: 1998-09-02

Title:	Proposal to add the letters LATIN SMALL / CAPITAL LETTER A WITH DOT ABOVE to the BMP
Source:	Mark Davis
Status:	Expert Contribution
Action:	For consideration by JTC1/SC2/WG2

This document contains the proposal summary (ISO/IEC JTC1/SC2/WG2 form N1352) and a full proposal for the encoding of two new characters in the BMP of ISO/IEC 10646.

A. Administrative

1.	Title	Proposal to add LATIN SMALL/CAPITAL LETTER A WITH DOT ABOVE to the BMP
2.	Requester's name	Mark Davis
3.	Requester type	Expert contribution
4.	Submission date	1998-09-02
5.	Requester's reference
6a.	Completion	This is a complete proposal.
6b.	More information to be provided?	No

B. Technical -- General

1a.	New script? Name?	No
1b.	Addition of characters to existing block? Name?	Yes, to Latin. Suggested locations are U+1E9C/U+1E9D. However, the characters could be added at any reasonable place in the BMP.
2.	Number of characters	2
3.	Proposed category	Category A
4.	Proposed level of implementation and rationale	Level 1
5a.	Character names included in proposal?	Yes
5b.	Character names in accordance with guidelines?	Yes
5c.	Character shapes reviewable?	Yes
6a.	Who will provide computerized font?	Mark Davis (if necessary--it is a trivial modification of any font containing U+01E0 and U+01E1)
6b.	Font currently available?	No, but it can be generated quickly
6c.	Font format?	TrueType
7a.	Are references (to other character sets, dictionaries, descriptive texts, etc.) provided?	N/A--See below
7b.	Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?	N/A--See below
8.	Does the proposal address other aspects of character data processing?	Yes

C. Technical -- Justification

1.	Has this proposal been submitted before?	No
2.	Contact with the user community?	N/A--See below
3.	Information on the user community?	N/A--See below
4a.	The context of use for the proposed characters?	N/A--See below
4b.	Reference	N/A--See below
5a.	Proposed characters in current use?	N/A--See below
5b.	Where?	N/A--See below
6a.	Characters should be encoded entirely in BMP?	Yes
6b.	Rationale	Required for efficient normalization of Unicode/10646, as described below.
7.	Should characters be kept in a continuous range?	It would be useful, but not absolutely necessary
8a.	Can the characters be considered a presentation form of an existing character or character sequence?	To the same degree as U+01E0 LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON
8b.	Where?	N/A--See below
8c.	Reference	N/A--See below
9a.	Can any of the characters be considered to be similar (in appearance or function) to an existing character?	No
9b.	Where?
9c.	Reference
10a.	Combining characters or use of composite sequences included?	No
10b.	List of composite sequences and their corresponding glyph images provided?	No
11.	Characters with any special properties such as control function, etc. included?	No

D. SC2/WG2 Administrative

To be completed by SC2/WG2

1.	Relevant SC 2/WG 2 document numbers:
2.	Status (list of meeting number and corresponding action or disposition)
3.	Additional contact to user communities, liaison organizations etc.
4.	Assigned category and assigned priority/time frame
5.	Other Comments

E. Proposal

Proposal to add the letters LATIN SMALL/CAPITAL LETTER A WITH DOT ABOVE to BMP of ISO/IEC 10646-1

While the character A WITH DOT ABOVE may indeed occur in natural languages or academic use, the principal reason for this proposal has to do with the nature of normalization. There has been a great deal of interest in providing complete specifications for different normalized forms of Unicode/10646. (Cf. http://www.unicode.org/unicode/reports/techreports.html)

One of the normalization forms of particular interest is one that normalizes to precomposed forms--for example, that always uses U+00C0 LATIN CAPITAL LETTER A WITH GRAVE instead of the sequence of A with a separate combining grave accent <U+0041, U+0300>.

Implementations can be particularly efficient if Unicode and 10646 are coded such that whenever a single composed character X is canonically equivalent to composed character sequence <B, C₁, C₂,...,C_n> then there is another composed character Y which is equivalent to the sequence without the final combining mark <B, C₁, C₂,...,C_n-1>. For the purposes of this discussion, Y is called the completion character for X. If X does not have a completion character, X is called incomplete. Notice that only characters with two or more combining marks need to be checked for completeness.

There are only two incomplete characters in 10646:

U+01E0 LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON U+01E1 LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON

By adding these characters, we can insure that implementations of normalization can uniformly apply the best algorithms to all text. By not having to check for special cases, the inner loops of the transformations can be as fast as possible.

The value of composed characters is fundamentally a product of their usefulness in implementations, since they could be expressed with composed character sequences. This is a special case where the addition of these characters is of particular value.

Name and glyph

	LATIN CAPITAL LETTER A WITH DOT ABOVE
	LATIN SMALL LETTER A WITH DOT ABOVE

Unicode Character Properties

XXXX;LATIN CAPITAL LETTER A WITH DOT ABOVE;Lu;0;L;0041 0307;;;;N;;;;YYYY; YYYY;LATIN SMALL LETTER A WITH DOT ABOVE;Ll;0;L;0061 0307;;;;N;;;XXXX;;XXXX

ISO INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION

Title:

Proposal to add the letters LATIN SMALL / CAPITAL LETTER A WITH DOT ABOVE to the BMP

Source:

Mark Davis

Status:

Expert Contribution

Action:

For consideration by JTC1/SC2/WG2