Re: Different Indic strings due to presence of ZWJ...

From: Mark Davis (
Date: Thu Nov 04 2004 - 08:55:20 CST

  • Next message: Michael \(michka\) Kaplan: "Re: basic-hebrew RtL-space ?"

    The ZWJ and ZWNJ are normally ignored in collation (including matching and
    searching), you'll see that if you look at the data for the Unicode
    Collation Algorithm.

    However, this raises any interesting issue. There are a few particular
    sequences where there is a semantic difference caused by the presence of
    this character. Those should be added to the collation sequences in the
    Unicode CLDR database (


    ----- Original Message -----
    From: "Bob Eaton" <>
    To: <>
    Sent: Thursday, November 04, 2004 04:58
    Subject: Different Indic strings due to presence of ZWJ...

    > A question has come up recently about two similarly looking words that
    > match based on a 'string comparison' due to the presence of the ZWJ.
    > For example, the two strings /हिन्दी/ and
    > /हिन्‍दी/ are identical to look at, but they differ in that
    > the latter has a ZWJ between the न्‍ and the द.
    > The problem is that certain “half-consonant” + “full-consonant”
    > conjuncts require the ZWJ in order to prevent a ‘full conjunct’ form
    > from occurring (e.g. /क्त/ as /क्‍त/). The only way to prevent
    > the full conjunct form is to insert the ZWJ.
    > But in the “न्‍ plus द” case, there is no more conjunct form
    > than that. So both with and without the ZWJ gives the same presentation
    > result.
    > This means that ultimately, the ZWJ is unnecessary in some cases of
    > “half-plus-full” conjuncts, but it is necessary in others (i.e.
    > /क्‍त/).
    > The keyboard I use has a key to press to get ‘half-consonants’ (i.e. by
    > inserting both the halant and the ZWJ, since that is what is required in
    > “harder” case). The problem is I use it also to get the half-न even
    > though in that particular case, it isn’t necessary. But a colleague is
    > using the 'halant-only' key, since the ZWJ is not technically necessary in
    > this case. The result is that the software thinks the two strings are
    > different.
    > Having the software think this, is, first of all, a real hassle, since
    > users can’t tell the difference between the two and won’t know why the
    > software thinks they're different.
    > So I have two questions:
    > 1) What does your keyboard do in this respect when typing "half+full"
    > consonant conjuncts? Do you only use the ZWJ where it is absolutely
    > necessary (i.e. /क्‍त/, but not /न्द/).
    > 2) If different, what do you think it ought to do?
    > Thanks,
    > Bob

    This archive was generated by hypermail 2.1.5 : Thu Nov 04 2004 - 08:57:32 CST