From: John Cowan (firstname.lastname@example.org)
Date: Wed Jan 29 2003 - 07:13:56 EST
Keyur Shroff scripsit:
> Sentiments are attached with cultures which may vary from one geographical
> area to another. So when one of the many languages falling under the same
> script dominate the entire encoding for the script, then other group of
> people may feel that their language has not been represented properly in
> the encoding.
Indeed, they may have such beliefs, but those beliefs are based on two
incorrect notions: that what the charts show is normative, and that the
codepoint is the proper unit of processing.
> In Unicode many characters have been given codepoints regardless of the
> fact that the same character could have been rendered through some compose
In every case this was done for backward compatibility with existing
encodings. No new codepoints of this type will be added in future.
> That is why the text should be normalized to either pre-composed or
> de-composed character sequence before going for further processing in
> operations like searching and sorting.
The collation algorithm makes allowance for these points.
It will be quite typical to tailor the algorithm to take language-specific
rules into account.
> Also, many times processing of text depends on the smallest addressable
> unit of that language. Again as discussed in earlier e-mails this may vary
> from one language to another in the same script. Consider a case when a
> language processor/application wants to count the number of characters in
> some text in order to find number of keystrokes required to input the text.
This will not work without knowledge of the keyboard layout in any case.
To enter Latin-1 characters on the Windows U.S. keyboard requires 5 keystrokes,
but they are represented by one or two Unicode characters.
-- Henry S. Thompson said, / "Syntactic, structural, John Cowan Value constraints we / Express on the fly." email@example.com Simon St. Laurent: "Your / Incomprehensible http://www.reutershealth.com Abracadabralike / schemas must die!" http://www.ccil.org/~cowan
This archive was generated by hypermail 2.1.5 : Wed Jan 29 2003 - 08:02:11 EST