From: Mark Davis (firstname.lastname@example.org)
Date: Tue May 10 2005 - 12:42:43 CDT
By default in UCA, the ZWJ and ZWNJ are completely ignorable. So any two
strings that only differ by those characters will sort next to one another.
(The one exception to complete ignorability is that they can block
200C ; [.0000.0000.0000.0000] # [200C] ZERO WIDTH NON-JOINER
200D ; [.0000.0000.0000.0000] # [200D] ZERO WIDTH JOINER
The viramas (halants), on the other hand, are given primary weights,
typically at the very end of each script, after the vowels. For example:
094D ; [.1853.0020.0002.094D] # DEVANAGARI SIGN VIRAMA
That means that the ordering is the following (where C1..Cn are consonants
(with default vowel); V1..Vm are vowels; and X is virama/halant
C1 X C1
C1 X C1 V1
C1 X C1 Vn
C1 X Cn
C1 X Cn V1
C1 X Cn Vn
This may be tailored on a per-language basis in CLDR, such as
----- Original Message -----
From: "N. Ganesan" <email@example.com>
To: "Unicode List" <firstname.lastname@example.org>
Sent: Tuesday, May 10, 2005 08:10
Subject: Collating nonconjunct and conjunct forms of words
> In Indian languages, ZWJ or ZWNJ are used
> to produce conjunct and nonconjunct forms
> of identical words.
> Interestingly, identical words appear in
> conjuncts form in some places in a book while
> nonconjunct forms of the same words appear elsewhere
> in that particular book. In Tamil, this situation
> exists for Sanskrit loan words.
> Also in the first half of 20th century,
> Islamic names were written with a conjunct ksha,
> are now universally written with a nonconjunct ksha.
> Linguistically, it makes sense to place the
> identical words next to each other while sorting
> a book words, if that book has the same word
> has both conjunct and conjunct letters at different places.
> How does Unicode treat collation of conjunct
> and nonconjunct forms of identical words?
> Are they next to each other? Since North Indian
> languages have possibly this situation many times,
> any general rule or policy?
> N. Ganesan
This archive was generated by hypermail 2.1.5 : Tue May 10 2005 - 12:43:42 CDT