Re: Indic Devanagari Query

From: Keyur Shroff (
Date: Wed Jan 29 2003 - 03:44:05 EST

  • Next message: Asmus Freytag: "Re: Indic Devanagari Query"


    Forgot to reply implementation query. The reply is inline.

    --- Aditya Gokhale <> wrote:
    > 2. Implementation Query -
    > In an implementation where I need to send / process Hindi, Marathi
    > and Sanskrit data, how do I differentiate between languages (Hindi,
    > Marathi and Sanskrit). Say for example, I am writing a translation
    > engine, and I want to translate a document having Hindi, Marathi and
    > Sanskrit Text in it, how do I know from the code points between 0x0900
    > and 0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ?
    > I would suggest that we should give different code pages for Marathi,
    > Hindi and Sanskrit. May be current code page of Devanagari can be traded
    > as Hindi and two new code pages for Marathi and Sanskrit be added. This
    > could solve these issues. If there is any better way of solving this, any
    > one suggest.

    Instead of changing/recommending change in an encoding standard, your
    problem can best be solved in your application. You can use tags in your
    text to specify language. Unicode also facilitates tagging your text but
    its use in Unicode is highly discouraged. So you can use some language
    similar to xml or html to specify language boundary. Then parse your text,
    identify the language boundaries, and do further processing depending upon
    the language.

    If you don't want to use tags in your text then you can predict language by
    using some heuristic. This heuristic can be used on some language
    properties which may be different for all three languages. In this case
    your processing will be divided into two phases. First phase involves
    applying some heuristic rule to identify language bounadaries from plain
    text and the second is actually processing text for translation. But beware
    that the result will not be accurate all the time with such heuristic
    processing. Hence use of tags is recommended.


    Do you Yahoo!?
    Yahoo! Mail Plus - Powerful. Affordable. Sign up now.

    This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 04:41:05 EST