RE: Indic Devanagari Query

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Jan 29 2003 - 05:22:59 EST

Next message: Keyur Shroff: "Re: Indic Devanagari Query"

Previous message: Aditya Gokhale: "Re: Indic Devanagari Query"
Maybe in reply to: Aditya Gokhale: "Indic Devanagari Query"
Next in thread: Keyur Shroff: "Suggestions in Unicode Indic FAQ"
Reply: Keyur Shroff: "Suggestions in Unicode Indic FAQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Aditya Gokhale wrote:
> Hello Everybody,
> I had few query regarding representation of Devanagari
> script in Unicode

All your questions are FAQ's, so I'll just reference the entries which
answers them.

> (Code page - 0x0900 - 0x097F). Devanagari is a writing
> script, is used in Hindi, Marathi and Sanskrit languages. I
> have following questions -

Unicode has no code pages:
http://www.unicode.org/faq/basic_q.html#18

> 1. In Marathi and Sanskrit language two characters glyphs of
> 'la' and 'sha' are represented differently as shown in the
> image below -
> (First glyph is 'la' and second one is 'sha')
> as compared to Hindi where these character glyphs are
> represented as shown in the image below -
> (First glyph is 'la' and second one is 'sha')

Unicode encodes (abstract) characters, not glyphs:
http://www.unicode.org/faq/han_cjk.html#3

(This FAQ is in the Chinese/Japanese/Korean section because it is more often
raised for Chinese ideograms.)

> In the same script code page, how do I use these two
> different Glyphs, to represent the same character ? Is there
> any way by which I can do it in an Open type font and Free
> type font implementation ?

Unicode's requirements for fonts:
http://www.unicode.org/faq/font_keyboard.html#1

A few links to OpenType stuff:
http://www.unicode.org/faq/font_keyboard.html#4

> 2. Implementation Query -
> In an implementation where I need to send / process
> Hindi, Marathi and Sanskrit data, how do I differentiate
> between languages (Hindi, Marathi and Sanskrit). Say for
> example, I am writing a translation engine, and I want to
> translate a document having Hindi, Marathi and Sanskrit Text
> in it, how do I know from the code points between 0x0900 and
> 0x097F, that the data under perusal is Hindi / Marathi / Sanskrit ?

What you need here is some sort of language tagging:
http://www.unicode.org/faq/languagetagging.html

> I would suggest that we should give different code pages
> for Marathi, Hindi and Sanskrit. May be current code page of
> Devanagari can be traded as Hindi and two new code pages for
> Marathi and Sanskrit be added. This could solve these issues.
> If there is any better way of solving this, any one suggest.

Characters are encoder "per scripts", not "per languages":
http://www.unicode.org/faq/basic_q.html#17

> 3. Character codes for jna, shra, ksh -
>
> In Sanskrit and Marathi jna, shra and ksh are considered as
> separate characters and not ligatures. How do we take care of
> this ? Can I get over all views on the matter from the group
> ? In my opinion they should be given different code points in
> the specific language code page.
> Please find below the character glyphs -

Unicode encodes Indic analytically:
http://www.unicode.org/faq/indic.html#17

> thanks,

For more details about Devanagari in Unicode, see Chapter 9 of the Standard:
http://www.unicode.org/uni2book/ch09.pdf

_ Marco

Next message: Keyur Shroff: "Re: Indic Devanagari Query"
Previous message: Aditya Gokhale: "Re: Indic Devanagari Query"
Maybe in reply to: Aditya Gokhale: "Indic Devanagari Query"
Next in thread: Keyur Shroff: "Suggestions in Unicode Indic FAQ"
Reply: Keyur Shroff: "Suggestions in Unicode Indic FAQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 06:07:49 EST