Re: Indic Devanagari Query

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Jan 29 2003 - 07:13:56 EST

  • Next message: Michael Everson: "Re: Indic Devanagari Query"

    Keyur Shroff scripsit:

    > Sentiments are attached with cultures which may vary from one geographical
    > area to another. So when one of the many languages falling under the same
    > script dominate the entire encoding for the script, then other group of
    > people may feel that their language has not been represented properly in
    > the encoding.

    Indeed, they may have such beliefs, but those beliefs are based on two
    incorrect notions: that what the charts show is normative, and that the
    codepoint is the proper unit of processing.

    > In Unicode many characters have been given codepoints regardless of the
    > fact that the same character could have been rendered through some compose
    > mechanism.

    In every case this was done for backward compatibility with existing
    encodings. No new codepoints of this type will be added in future.

    > That is why the text should be normalized to either pre-composed or
    > de-composed character sequence before going for further processing in
    > operations like searching and sorting.

    The collation algorithm makes allowance for these points.
    It will be quite typical to tailor the algorithm to take language-specific
    rules into account.

    > Also, many times processing of text depends on the smallest addressable
    > unit of that language. Again as discussed in earlier e-mails this may vary
    > from one language to another in the same script. Consider a case when a
    > language processor/application wants to count the number of characters in
    > some text in order to find number of keystrokes required to input the text.

    This will not work without knowledge of the keyboard layout in any case.
    To enter Latin-1 characters on the Windows U.S. keyboard requires 5 keystrokes,
    but they are represented by one or two Unicode characters.

    -- 
    Henry S. Thompson said, / "Syntactic, structural,               John Cowan
    Value constraints we / Express on the fly."     jcowan@reutershealth.com
    Simon St. Laurent: "Your / Incomprehensible     http://www.reutershealth.com
    Abracadabralike / schemas must die!"            http://www.ccil.org/~cowan
    


    This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 08:02:11 EST