Re: hebrew font conversion

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun May 22 2005 - 13:18:48 CDT

  • Next message: Philippe Verdy: "Re: hebrew font conversion"

    You have certainly used a legacy font that mapped hebrew letter glyphs on
    top of symbols or on ISO-8859-1 characters that all have a strong or weak
    LTR directionality. For this reason, the Bidi algorithm did not apply to
    these old documents.

    When you replace the codepoints by normal Hebrew codepoints, Word
    consistently applies the Bidi algorithm to render them, and so the visual
    order is now reversed. So the text is effectively encoded with a "visual"
    order instead of the "logical" order.

    So you'll have effectively to reverse the effect of what the BiDi algorithm
    makes now:
    - This means not only reversing the Hebrew letters,
    - but also handling the case where characters with weak directionality (like
    punctuations) are also swapped now,
    - and possibly mirrored.

    To know exactly what to do, you have to study what the BiDi algorithm does,
    and then adapt the encoding so that the standard BiDi reordering (and
    mirroring) will generate the correct visual order and characters
    orientation. The conversion will sometimes require inserting some BiDi
    controls to avoid that these characters with weak directionality be
    reordered or mirrored

    (Be careful about the effect of mirroring: the BiDi algorithm changes the
    orientation of some characters like parentheses, so if you just swap
    characters, the parentheses may look incorrect: you'll have to change their
    orientation by substituting the codepoints by the corresponding mirrored
    character).

    There are tools that perform that notably for Hebrew and Arabic: i.e.
    converting texts from visual to logical encoding order. But I don't know one
    that works with Word documents: so you may need to create a conversion
    macro...

    ----- Original Message -----
    From: Raymond Mercier
    To: unicode@unicode.org
    Sent: Sunday, May 22, 2005 6:45 PM
    Subject: hebrew font conversion

    [This is really a question for the Hebrew Computing Forum, but I have tried
    there and drew a blank.]
    The problem is that I composed many documents in Word using an ad hoc Hebrew
    font, and wish to convert to Unicode.
    When I run a macro that exchanges the old codepoints for the U+Hebrew
    points, the characters in each word are reversed. I have tried to cure this
    by writing another macro using StrReverse() . Sometimes this works, but
    there are problems - especially with tables.
    Does anyone have experience of this, and or/a solution ?
    I will have the same problem with Arabic Word docs.



    This archive was generated by hypermail 2.1.5 : Sun May 22 2005 - 13:19:48 CDT