From: Magda Danish \(Unicode\) (email@example.com)
Date: Tue Mar 11 2003 - 13:28:28 EST
Please make sure to copy Vladimiriranorus@online.ru on your reply.
> -----Original Message-----
> From: Vladimir Ivanov [mailto:firstname.lastname@example.org]
> Sent: Tuesday, March 11, 2003 6:22 AM
> To: Magda Danish (Unicode)
> Subject: ZWNJ & Persian Collation
> Dear Magda,
> Excuse for bothering you again, but my message was rejected
> by some server
> on its way to email@example.com . May I ask you to publish
> my question
> below? Thank you, Vladimir.
> Sorting Persian words with a utility, based on version 3.1.1
> of tailored
> Allkeys Table http://www.unicode.org/reports/tr10/#AllKeys,
> I’ve encountered
> a problem that affects the lexicographical order of the words in a
> To my mind, ZWNJ (zero width non-joiner) U+200C (also found
> among MS Word
> Special Characters/No-width Optional Break), was invented to prevent
> connection of Arabic letters within a word.
> It is used in Persian to show the morphemic boundary in
> compound words like
> خانهداری xānedāri ‘household’. The latter consists of the
> word خانه xāne
> ‘house’ + verb stem دار dār ‘hold’ + suffix ی ‘i’. It can be
> like xāne + ZWNJ + dāri. There are thousands words with
> similar structure in
> Persian, Dari, Tajik and neighboring languages.
> It is clearly seen that there are letters on both sides of
> ZWNJ within the
> word boundaries. Placing ZWNJ on an edge of the word doesn’t
> make sense in
> Persian. From this point of view ZWNJ should be treated as a special
> character rather than a delimiter.
> But in Allkeys Table it is placed on line #68 well before
> other popular
> delimiters: HORIZONTAL TABULATION line #192,
> LINE FEED line #193,
> CARRIAGE RETURN line #196,
> SPACE line #197 etc.
> Such an ordering gives wrong sorting results for Persian dictionaries:
> compound words like خانهداری xānedāri ‘household’ appear in
> the list before
> their components like خانه xāne ‘house’.
> I’ve sold this problem for myself by placing ZWNJ somewhere after
> delimiters, but what are the theoretical reasons for putting
> it before them?
> In order to get what? In what languages?
> Is it a Persian specific problem or a global one? Are there
> languages where
> ZWNJ marks a word boundary?
> By the way, the sorting algorithm built into MS Windows puts
> compound words
> with ZWNJ AFTER their simple components. So in this respect
> it acts on the
> principles different from Allkeys Table.
> Thank you,
> Vladimir Ivanov
This archive was generated by hypermail 2.1.5 : Tue Mar 11 2003 - 15:34:18 EST