Unicode Technical Note #18

taM to unicode conversion

Version	1
Authors	P.Chellappan (chellappan@vsnl.com)
Date	23 September 2004
This Version	http://www.unicode.org/notes/tn18/tn18-1.html
Previous Version	none
Latest Version	http://www.unicode.org/notes/tn18/

Summary

TAM is the official monolingual Tamil encoding scheme of the Government of Tamilnadu, which has the largest Tamil speaking population in the world. A vast amount of Tamil textual information in digital libraries, online newspapers, magazines etc., is available today in this encoding scheme. As Unicode is fast becoming the encoding by choice, there is a need for conversion from TAM encoded text to Unicode.

TAM is a glyph encoding scheme, while Unicode is a character encoding scheme. Hence there exists a one-to-one, one-to-many, many-to-one or many-to-many relationship between the Tamil alphabets in TAM and those in Unicode.

This note is split into two parts. The first part describes, in a simple C like pseudo code, how to determine the string sequence in TAM that goes to make a Tamil alphabet. The second part provides a cross mapping table to convert this sequence into the corresponding Unicode string sequence.

Status

This document is a Unicode Technical Note. It is supplied purely for informational purposes and publication does not imply any endorsement by the Unicode Consortium. For general information on Unicode Technical Notes, see http://www.unicode.org/notes/.

The body of this note is contained in the file "tam_to_unicode.pdf".

© 2004 P.Chellappan. This publication is protected by copyright, and permission must be obtained from the author and Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the Terms of Use.

Use of this publication is governed by the Unicode Terms of Use. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.

Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.

Unicode Technical Note #18

taM to unicode conversion

Summary

Status

Contents