RE: Arabic and Hindi digits, what to store ?

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Sun Jan 27 2002 - 00:09:03 EST


Dear Isam,

Generally, when storing numeric data, the answer is "neither". Use a numeric type (like int, long, float, etc.) and only convert the numbers at display time.

If you mean "should I change numeric characters in textual data (strings)", then the answer depends on your application. In most cases, it is a bad idea to change a user's textual data because you typically cannot recover the initial state of the data later (when you might need it). Users may be surprised to see their data mutating.

Instead, you can make use of the Unicode character database, digit folding, and normalization to perform runtime analysis of the data (for example, to retrieve the number value of the string).

Of course, in some applications you may need/prefer to pre-process the data instead of preserving the original string. Or you may need to create relationships (as in a database) that require you to process the data in this way (so that matches match). A combination of digit-folding (to ASCII) and Unicode Form C normalization works pretty well. *Careful* processing using Form KC can also be useful sometimes (see link below). Again: if you're processing values that are strictly numeric, make them into typed objects!

The other common use of numbers is in dates: parsing the date into a date data type (much like I just recommended for numbers) makes a lot of sense, especially in locales (such as many of the Arabic locales) in which you may wish to use more than one or variant calendars to display the same date value.

Some useful links, especially the last:

http://www.unicode.org/unicode/reports/tr15/
http://www.w3.org/TR/WD-charreq
http://www.w3.org/TR/charmod/#sec-Normalization
http://www.w3.org/TR/1999/WD-unicode-xml-19990928/#Compatibility

I hope that helps.

Best Regards,

Addison

Addison P. Phillips
Globalization Architect / Manager, Globalization Engineering
webMethods, Inc. | The Business Integration Company
432 Lakeside Drive, Sunnyvale, California, USA
+1 408.962.5487 (phone) +1 408.210.3569 (mobile)
-------------------------------------------------
Internationalization is an architecture. It is not a feature.

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
Behalf Of Isam Bayazidi
Sent: Saturday, January 26, 2002 5:50 PM
To: unicode@unicode.org
Subject: Arabic and Hindi digits, what to store ?

Hi all ..
        I have a quick question, we are developing several Arabic enabled software,
and adding Arabic support to already existing ones .. and one of the issues
that we faced is Should we store the numbers in thier Hindi Format or ASCII ?
we know that showing them in what ever look is a matter of preferance, but
what we are asking .. what would be better action to do , to store the digits
displayed in hindi in thier Hindi encodings, or use the Arabic digits defined
in ASCII ( the first 128 places of ISO ) ?

-- 
Yours,
Isam Bayazidi
Amman - Jordan
====================================================
 Think Linux + Think Arabic = Think www.arabeyes.org
====================================================



This archive was generated by hypermail 2.1.2 : Sat Jan 26 2002 - 23:42:06 EST