From: Javier SOLA (email@example.com)
Date: Fri Jan 30 2009 - 02:02:06 CST
It is a little more complex.
- ZWSP has always been defined as a word boundary, until last May. Its
use to separate syllables is incorrect, because it breaks the words, not
allowing correct functioning of spell-checkers, search engines or
- In 2003 its typology was changed in UNICODE from being a spacing
character to a format character. As the character was not explicitly
added to the word-boundary algorithm, all software that used the Unicode
tables directly stopped working for functions that used this character
as a word boundary (such as spell-checking).
- On 22nd May 2008 an erroneus errata was added to Unicode 5.0 and 5.1
eliminating the word-boundary property of the character, leaving only
the line-breaking property. Unfortunatelly, this errata has NOT YET BEEN
- On August 2008 I presented a proposal to the UTC to revert to the
original word boundary property of the character.
- On the last meeting of the UTC, a public review issue was published,
it is as follows:
The Unicode Technical Committee is considering changing the Word_Break
property value for ZWSP from the value WB=Format to the value WB=Other
(WB=XX). The effect of this would be to have the ZWSP act as a
word-separator in the default word break algorithm, and is consistent
with its usage in Thai, Lao, and other scripts that don't use spaces for
The issue should be resolved in the next UTC meeting, which should also
ensure that the ERRATA is erased.
Doug Ewell wrote:
> ɹɐzlnƃ ɟıʇɐ <atif dot gulzar at gmail dot com> wrote:
>> I have checked and could not find any Unicode character for word
>> separator (zero width space as WORD separator). This character/code
>> is needed for languages where space is not used as word separator.
>> The available zero width characters are incapable to address this
>> issue. e.g.
>> U+200B Zero Width Space: This character is intended for line break
>> control (In Lao language lines can be broken at syllable levels, Lao
>> uses U+200B to mark syllable boundaries).
> According to Section 11.1 on Thai in TUS 5.0 (p. 376), and Section
> 16.2 on layout controls (p. 535), U+200B ZERO WIDTH SPACE is the right
> character for marking word boundaries in languages like Thai which
> don't use visible spaces between words. I don't see why this would be
> different for Lao.
> Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
> http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Fri Jan 30 2009 - 02:04:21 CST