RE: behaviour of ZWNBSP (was Re: Unicode and Kermit)

From: Reynolds, Gregg (greynolds@datalogics.com)
Date: Mon Aug 16 1999 - 18:42:56 EDT


> -----Original Message-----
> From: peter_constable@sil.org [mailto:peter_constable@sil.org]
> Sent: Monday, August 16, 1999 4:21 PM
> To: Unicode List
> Subject: Re: behaviour of ZWNBSP (was Re: Unicode and Kermit)
>
>
>
>
>
> Actually, what I have in mind is hypothetical - I don't know if
> this would ever arise, and I can't think of any specific
> examples from Thai or another language that would qualify:
>
> In the English string "Mr. Smith", I might prefer not to have a
> line break between the words "Mr." and "Smith". Of course, we
> have NBSP for that purpose. Suppose, this scenario, however: I
> have a corpus of data for a language that, like Thai, is
> written without visible spaces between all words, and that I am
> using ZWSP to delimit any word boundaries not delimited by SP,
> PS, etc. I have, however, certain word pairs that, like "Mr.
> Smith", I don't want to break across a line. It seemed obvious
> that ZWBNSP is exactly what is needed.
>
> In other words, ZWBNSP is to ZWSP what NBSP is to SP, but
> useful mostly for writing systems where not all word boundaries
> are overtly indicated with visible space.
>
This occurs in Arabic. The particle 'wa' (U+0684) (very naively translated
as 'and', but much more complex than that) is a distinct 'word', and not
(grammatically) a clitic, but is usually written as if it were; that is, it
is followed space (since it doesn't connect with a following letter), but
not by "word spacing", so that it looks as if it is part of a longer word.
Linebreaks don't occur between it and the following word.

-gregg
(still working on a detailed reference for Arabic encoding and typesetting,
but beleagured by computer gremlins.)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT