RE: Indic Devanagari Query

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Jan 29 2003 - 05:22:59 EST

  • Next message: Keyur Shroff: "Re: Indic Devanagari Query"

    Aditya Gokhale wrote:
    > Hello Everybody,
    > I had few query regarding representation of Devanagari
    > script in Unicode

    All your questions are FAQ's, so I'll just reference the entries which
    answers them.

    > (Code page - 0x0900 - 0x097F). Devanagari is a writing
    > script, is used in Hindi, Marathi and Sanskrit languages. I
    > have following questions -

    Unicode has no code pages:
            http://www.unicode.org/faq/basic_q.html#18

    > 1. In Marathi and Sanskrit language two characters glyphs of
    > 'la' and 'sha' are represented differently as shown in the
    > image below -
    > (First glyph is 'la' and second one is 'sha')
    > as compared to Hindi where these character glyphs are
    > represented as shown in the image below -
    > (First glyph is 'la' and second one is 'sha')

    Unicode encodes (abstract) characters, not glyphs:
            http://www.unicode.org/faq/han_cjk.html#3

    (This FAQ is in the Chinese/Japanese/Korean section because it is more often
    raised for Chinese ideograms.)

    > In the same script code page, how do I use these two
    > different Glyphs, to represent the same character ? Is there
    > any way by which I can do it in an Open type font and Free
    > type font implementation ?

    Unicode's requirements for fonts:
            http://www.unicode.org/faq/font_keyboard.html#1

    A few links to OpenType stuff:
            http://www.unicode.org/faq/font_keyboard.html#4

    > 2. Implementation Query -
    > In an implementation where I need to send / process
    > Hindi, Marathi and Sanskrit data, how do I differentiate
    > between languages (Hindi, Marathi and Sanskrit). Say for
    > example, I am writing a translation engine, and I want to
    > translate a document having Hindi, Marathi and Sanskrit Text
    > in it, how do I know from the code points between 0x0900 and
    > 0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ?

    What you need here is some sort of language tagging:
            http://www.unicode.org/faq/languagetagging.html

    > I would suggest that we should give different code pages
    > for Marathi, Hindi and Sanskrit. May be current code page of
    > Devanagari can be traded as Hindi and two new code pages for
    > Marathi and Sanskrit be added. This could solve these issues.
    > If there is any better way of solving this, any one suggest.

    Characters are encoder "per scripts", not "per languages":
            http://www.unicode.org/faq/basic_q.html#17

    > 3. Character codes for jna, shra, ksh -
    >
    > In Sanskrit and Marathi jna, shra and ksh are considered as
    > separate characters and not ligatures. How do we take care of
    > this ? Can I get over all views on the matter from the group
    > ? In my opinion they should be given different code points in
    > the specific language code page.
    > Please find below the character glyphs -

    Unicode encodes Indic analytically:
            http://www.unicode.org/faq/indic.html#17

    > thanks,

    For more details about Devanagari in Unicode, see Chapter 9 of the Standard:
            http://www.unicode.org/uni2book/ch09.pdf

    _ Marco



    This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 06:07:49 EST