Re: Collating nonconjunct and conjunct forms of words

From: Markus Scherer (markus.icu@gmail.com)
Date: Tue May 10 2005 - 11:58:25 CDT

  • Next message: Markus Scherer: "Re: Named character sequneces and canonical equivalence, was: Cyrillic - accented/acuted vowels"

    Unicode has a companion standard, the Unicode Collation Algorithm
    (UCA, http://www.unicode.org/reports/tr10/). It provides a default
    sort order for all of Unicode but allows for tailorings as well, with
    which you can change the sorting of any character, or sequence of
    characters. You could for example make ZWJ and ZWNJ ignorable, if they
    are not already, which makes sequences sort the same whether they
    contain these characters or not.

    markus

    On 5/10/05, N. Ganesan <naa.ganesan@gmail.com> wrote:
    > How does Unicode treat collation of conjunct
    > and nonconjunct forms of identical words?
    > Are they next to each other? Since North Indian
    > languages have possibly this situation many times,
    > any general rule or policy?

    -- 
    Opinions expressed here may not reflect my company's positions unless
    otherwise noted.
    


    This archive was generated by hypermail 2.1.5 : Tue May 10 2005 - 11:58:59 CDT