From: Aditya Gokhale (
Date: Wed Jan 29 2003 - 05:18:28 EST

        Thanks for the reply. I will check the points as you said, as far as the
    font issues are considered. We all know how jna,shra and ksh are formed in
    UNICODE and ISCII, but the point I wanted to make was, if we have to sort /
    search / process the data in Devanagari script, then we have to keep track
    of at least three characters and not one. This becomes tedious, thought not
    impossible. If single
    code point is present it will be very easy to process.
        With regards, to "predict language by using some heuristic", in my
    opinion it is a very risky solution, at least when I don't have much
    information at stage one of my application. I am running OCR engine on a
    Devanagari page, then based on the formatting, tagging the language. So I
    think tagging, as I am doing right now is a better solution. I also agree
    with the views expressed by Asmus Freytag, that if we go on including all
    the 6000 languages, it will be extremely impossible to cross-correlate these
    'code pages'.


