L2/00-258

Draft for Discussion at UTC meeting


ISO
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 2/WG 2

Universal Multiple-Octet Coded Character Set
(U C S)

ISO/IEC JTC1/SC2/WG2 N----
Date: 2000-08-09

 

Title: 

Proposal for addition of ZERO WIDTH WORD JOINER

Source: 

Unicode Technical Committee

Status: 

Liaison

Action: 

For consideration by JTC1/SC2/WG2

The codepoint U+FEFF serves two very different purposes.

It is clear in retrospect that this was a grave mistake. If U+FEFF only had the semantic of a signature codepoint, it could be freely deleted from text without affecting the interpretation of the rest of the text. Carelessly appending files together, for example, can result in a signature codepoint in the middle of text. Unfortunately, U+FEFF does also have significance as a character. As a ZWNBSP, it indicates that line breaks are not allowed between the adjoining characters. Thus U+FEFF does impact the interpretation of text, and cannot be freely deleted. The overloading of semantics for this codepoint has caused innumerable problems for programs, not in the least in terms of overall comprehensibility of Unicode/10646.

To ameliorate this situation, the UTC has approved the addition of a new character at U+2060, ZERO WIDTH WORD JOINER. This character would have the same semantics in all cases as U+FEFF, except that it cannot be used as a signature. The goal is to move implementations to use this new character over the next few years, discouraging the use of U+FEFF as ZWNBSP. At some point in time, the use of U+FEFF as a ZWNBSP can be deprecated, thus reserving only the use as a signature. This will simplify the programming model for Unicode/10646 significantly, and decrease the opportunity for error.

The UTC urges WG2 to also approve this character for addition to ISO 10646.