Re: behaviour of ZWNBSP (was Re: Unicode and Kermit)

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 16 1999 - 17:28:11 EDT


Peter,

> >So ap<ZWSP>ple will count as two words, but ap<ZWNBSP>ple
> counts as one. Now, we are actually talking about word
> selection and arrow keying, but this is related to line
> breaking, so looking at UTR#14 Line Breaking Properties (http:
> //www.unicode.org/unicode/reports/tr14/) might be useful.
>
> Your examples make sense, given the assumed behaviour of
> ZWNBSP. But where is this behaviour defined?

I don't think it is fully defined anywhere. The Implementation Guidelines
suggest that for word boundaries one should "break between letters and
non-letters." Beyond that it gets fairly complex, and probably has
to be guided lexically on a per-language basis, if one were to get
exactly correct behavior.

> Yes, in terms of
> line breaking, it is defined in UTR14. By virtue of the NB in
> ZWNBSP, I wouldn't expect lines to break there. But it isn't a
> given that word selection should have the same behaviour. I
> would expect that
>
> X<ZWSP>Y
> Y<ZWSP>Z
> X<ZWNBSP>Y
> Y<ZWNBSP>Z
>
> should all have a word count of 2, by virtue of the SP in
> *both* ZWSP and ZWNBSP.

Don't be misled by the name. ZWNBSP is *not* a space character, by
any definition.

U+0020 SPACE is general category Zs and bidi category WS.

U+200B ZWSP is general category Zs and bidi category BN. (boundary neutral)

U+FEFF ZWNBSP is general category Cf and bidi category BN.

Elements of general category "Cf" are generally ignorable for
the determination of boundaries, except where they also have the property
of non-breaking (as for ZWNBSP) -- in which case they *do* interact
with line-breaking.
 
--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT