RE: Word reversal from Abobe to Word from Murray Sargent on 2013-02-07 (Unicode Mail List Archive)

From: Murray Sargent <murrays_at_exchange.microsoft.com>
Date: Fri, 8 Feb 2013 05:35:13 +0000

In this simple RTF, Word takes the \fN pretty seriously. You need to specify a charset with the desired directionality. Word has more sophisticated RTF to handle directionality, but without it, you need to define the \fN correctly. The idea is that you can overrule the directionality by claiming the script has the reverse directionality. This enables Word to write RTF that represents an LRO...PDF embedding.

Murray

-----Original Message-----
From: Asmus Freytag [mailto:asmusf_at_ix.netcom.com]
Sent: Thursday, February 7, 2013 9:28 PM
To: Murray Sargent
Cc: Dreiheller, Albrecht; Raymond Mercier; unicode_at_unicode.org
Subject: Re: Word reversal from Abobe to Word

How come I'm not surprised to see the problem traced to an RTF format incompatibility. Trying to figure out which parts of the RTF spec to support when is nearly impossible...

A./

On 2/7/2013 8:08 AM, Murray Sargent wrote:
> If you include a {\fonttbl...} entry that defines \f0 as an Arabic
> font, Word displays it correctly. For example, include
> {\fonttbl{\f0\fswiss\fcharset177 Arial;}}
>
> as in
>
> {\rtf1{\fonttbl{\f0\fswiss\fcharset177 Arial;}}
> \pard\plain\ql\f0\fs20 {\fs40 \u1511 \'F7\u1493 \'E5\u1491 \'E3\u1502
> \'EE} }
>
> This displays as קודמ
>
> Murray
>
> -----Original Message-----
> From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org]
> On Behalf Of Dreiheller, Albrecht
> Sent: Thursday, February 7, 2013 7:33 AM
> To: Raymond Mercier; unicode_at_unicode.org
> Subject: RE: Word reversal from Abobe to Word
>
>
> Raymond,
>
>> If I have a Hebrew text displayed in Adobe Acrobat I can select part
>> of it and can paste it into Word. The trouble is that while
>> individual characters are correctly displayed the order is reversed.
>> Thus if I have
>> in Acrobat
>> קודמ (meaning 'prior')
>> when pasted into Word I get
>> םדוק
> The Windows clipboard is a "multi-channel" medium, i.e. several different data formats may be supplied at the same time by the sending application.
> The receiving application may choose one of these formats.
>
> Using a clipboard debugging tool, I see that Word fills up to 18
> formats, like 000D Unicode Text (10 Bytes)
> C090 Rich Text Format (5815 Bytes)
> C10E HTML Format (3641 Bytes),
> whereas Adobe fills only 6 formats, e.g.
> 000D Unicode Text (11 Bytes)
> C090 Rich Text Format (178 Bytes)
>
> In both cases, the Unicode Text format contains the sequence
> U+05E7, U+05D5, U+05D3, U+05DE in logical order.
>
> When "paste" is used in Word, a high level format is preferred by default, so I suppose the RTF format is the problem here.
>
> Word creates an RTF sequence like
> {\ltrch\fcs1 \af220\afs40\alang1033 \rtlch\fcs0 \f220\fs40\lang1037
> \langnp1033\langfenp2052\insrsid13502069\charrsid6162033\'f7\'e5\'e3\'
> ee}}
>
> N.B. \'f7\'e5\'e3\'ee is the CP1255 byte sequence for the Hebrew word above.
>
> Adobe produces this RTF sequence:
> \pard\plain\ql\f0\fs20 {\fs40 \u1511 \'F7\u1493 \'E5\u1491 \'E3\u1502 \'EE} which is the right character sequence, but seems to be misunderstood by Word.
>
> A solution is to use the Word command "Paste contents ..." (might be necessary to add it with "Customize"), and then choose "unformatted Unicode text" from the format list.
>
> Albrecht.
>
>
>
>
>
>
Received on Thu Feb 07 2013 - 23:39:25 CST

This archive was generated by hypermail 2.2.0 : Thu Feb 07 2013 - 23:39:26 CST