RE: Suggestions in Unicode Indic FAQ

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Jan 29 2003 - 08:39:20 EST

  • Next message: Kent Karlsson: "RE: Suggestions in Unicode Indic FAQ"

    Keyur Shroff wrote:
    > In the FAQ
    > http://www.unicode.org/faq/indic.html#16
    >
    > It is mentioned that following are equivalent
    >
    > ISCII Unicode
    > KA halant INV KA virama ZWJ
    > RA halant INV RAsup (i.e., repha)

    The last line is really bizarre! I would agree that it is plain wrong...

    What is supposed to appear in column "Unicode" is the Unicode *encoding*
    equivalent to the <RA halant INV> in the "ISCII" column. But "RAsup (i.e.,
    repha)" is the description of a *glyph*.

    > In fact there is no way in Unicode to produce RAsup directly,
    > i.e., without using base consonant. [...]

    I agree. This issue has been raised several times, and several viable
    solutions have been proposed, but I don't remember that Unicode "officials"
    ever showed to even acknowledge the problem.

    But probably this has been noted down and discussed. I hope to see an
    official solution in TUS 4.0.

    > SUGGESTION-3:
    >
    > Use of SPACE character as consonant may create problem for
    > state machine which finds language/syllable boundary.
    > In fact we need a codepoint for one invisible consonant
    > (similar to INV in ISCII) in Unicode which can solve
    > this problem with Unicode.
    >
    > After inclusion of INV character the following can be recommended.
    >
    > ISCII Unicode
    > KA halant INV KA virama INV
    > RA halant INV RA virama INV (i.e., repha)
    > INV halant RA INV virama RA (RAsub)

    Why not representing INV with a double ZWJ? E.g.:

            ISCII Unicode
            KA halant INV KA virama ZWJ ZWJ
            RA halant INV RA virama ZWJ ZWJ (i.e., repha)
            INV halant RA ZWJ ZWJ virama RA (RAsub)

    This has the advantage that the most common sequences will work OK also on
    old display engines implemented *before* the double-ZWJ convention is
    introduced.

    E.g., sequence "KA virama ZWJ ZWJ" works well also on an old engine, for the
    simple reason that the first ZWJ is enough to do the work, and the second
    ZWJ is invisible.

    Of course, an old engine will still display a <RA[eyelash]> for <RA virama
    ZWJ ZWJ>, but that is not worse than displaying <RA+virama> followed by a
    white box, which is what would happen with your new INV character.

    _ Marco



    This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 09:31:12 EST