Different Indic strings due to presence of ZWJ...

From: Bob Eaton (pete_dembrowski@hotmail.com)
Date: Thu Nov 04 2004 - 06:58:12 CST

  • Next message: Mark Davis: "Re: Searching Special Characters"

    A question has come up recently about two similarly looking words that don't
    match based on a 'string comparison' due to the presence of the ZWJ.

    For example, the two strings /हिन्दी/ and
    /हिन्‍दी/ are identical to look at, but they differ in that
    the latter has a ZWJ between the न्‍ and the द.

    The problem is that certain “half-consonant” + “full-consonant”
    conjuncts require the ZWJ in order to prevent a ‘full conjunct’ form
    from occurring (e.g. /क्त/ as /क्‍त/). The only way to prevent
    the full conjunct form is to insert the ZWJ.

    But in the “न्‍ plus द” case, there is no more conjunct form
    than that. So both with and without the ZWJ gives the same presentation

    This means that ultimately, the ZWJ is unnecessary in some cases of
    “half-plus-full” conjuncts, but it is necessary in others (i.e.

    The keyboard I use has a key to press to get ‘half-consonants’ (i.e. by
    inserting both the halant and the ZWJ, since that is what is required in the
    “harder” case). The problem is I use it also to get the half-न even
    though in that particular case, it isn’t necessary. But a colleague is
    using the 'halant-only' key, since the ZWJ is not technically necessary in
    this case. The result is that the software thinks the two strings are

    Having the software think this, is, first of all, a real hassle, since most
    users can’t tell the difference between the two and won’t know why the
    software thinks they're different.

    So I have two questions:
    1) What does your keyboard do in this respect when typing "half+full"
    consonant conjuncts? Do you only use the ZWJ where it is absolutely
    necessary (i.e. /क्‍त/, but not /न्द/).

    2) If different, what do you think it ought to do?



    This archive was generated by hypermail 2.1.5 : Thu Nov 04 2004 - 07:07:14 CST