Re: help on the notion of Zero Width Space

From: John H. Jenkins (jenkins@apple.com)
Date: Fri Nov 07 1997 - 15:05:15 EST


On 11/7/97 11:15 AM, Kaiying Yang (kaiying@ccl.umist.ac.uk) wrote:

>Can anybody tell me what the code point of FEFF ZERO WIDTH NO-BREAK SPACE is
>designed for ? can it be used as the possible marker or word delimiter in
>text ? Presumably, plain unicode text should be able to store the
>information of this code point.
>

The original intent of U+FEFF was to be the "byte-order mark." Its
byte-swapped counterpart, U+FFFE is explicitly *not* a valid Unicode
character, so if you see text starting with or containing U+FEFF you know
you've got the right byte-order, and if you see text starting with or
containing U+FFFE you know you've got the wrong one and need to byte-swap
all the Unicode values you're dealing with.

As a part of the merger between Unicode 10646, it was given the
additional meaning (and name) of zero-width no-break space. Its use is
documented in The Book, p. 6-131, where it says:

"As ZERO WIDTH NO-BEAK SPACE, U+FEFF behaves like U+00A0 NO-BREAK SPACE
in that it indicates the absence of word boundaries; however, the former
has no width. For example, this character can be inserted after the
forth character in the text 'base+delta' to indicate that there should be
no line break between the 'e' and the '+'."

=====
John H. Jenkins
jenkins@apple.com
tseng@blueneptune.com
http://www.blueneptune.com/~tseng



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT