I'm wondering if someone can help me.
I am trying to convert some data coming from an IBM iSeries system encoded in EBCDIC (Arabic CCSID 420) to UTF-16. The original data consists of a date, some Arabic text and a currency amount, all stored (as is usual for Arabic EBCDIC data on iSeries ) in visual sequence, e.g.:
01/10/12 TXET CIBARA 3,902.07
I am successfully converting this using ICU to UTF-16 and then applying bidi processing to produce UTF-16 data that has the Arabic text stored in logical sequence:
01/10/12 ARABIC TEXT 3,902.07
HOWEVER, when this is displayed by NotePad, what the user sees is:
01/10/12 3,902.07 TXET CIBARA
In other words, the Arabic is display correctly (RTL) but the currency amount has moved to the left of it. This is not acceptable to my users, who require the order of the items (date, text, amount) on the page to remain unchanged.
I am guessing this is happening because of this logic, described by the Unicode bidi spec
http://unicode.org/reports/tr9/#BD1:
Quote:
Examples. A list of numbers separated by neutrals and embedded in a directional run will come out in the run’s order.
Storage: he said "THE VALUES ARE 123, 456, 789, OK".
Display: he said "KO ,789 ,456 ,123 ERA SEULAV EHT".
In this case, both the comma and the space between the numbers take on the direction of the surrounding text (uppercase = right-to-left), ignoring the numbers. The commas are not considered part of the number because they are not surrounded on both sides by digits (see Section 3.3.3, Resolving Weak Types).
My question is: can I override this logic somehow, e.g. through the use of format codes, to ensure that the data items (date, Arabic text, currency amount) appear in the correct sequence, but the Arabic is still displayed correctly? I have so far failed to find any set of format codes that makes this happen.
Thanks in advance for any assistance you can offer.