Unicode Arabic Rendering Problem

From: Mete Kural (metekural@yahoo.com)
Date: Fri Feb 28 2003 - 13:05:07 EST

  • Next message: Yung-Fong Tang: "Re: Unicode 4.0 BETA available for review"

    Hello Folks,

    I wanted to ask a question to those of you who have
    Unicode Arabic knowledge. We have this website
    http://www.quranreader.org where we are trying to
    display the text of the Quran with accurately encoded
    Unicode text rather than the traditional images. Some
    of the characters in the Quran aren't rendered
    correctly. We are letting the browser to use its
    default Unicode font on the website, which is Times
    New Roman Unicode for the newer versions of Internet
    Explorer I think. If we used a high-quality Unicode
    font for Arabic, would this solve the problem? Or is
    this a bigger problem that has to do with the
    rendering engine provided by the operating system?

    I would like to give you an example. In Arabic when
    you have a Lam And Alef together, it is rendered in a
    unique way instead of the regular rendering for these
    letters that kind of looks like this:

     \ /
      \/
      /\
      \/
    Figure 1

    In the Quran, there is sometimes this combination of
    characters: Lam-Hamza-Alif
    In such a case, the Lam and Alif are still rendered
    the way they would be had there not been a hamza
    inbetween, and the hamza is simply put above the alef
    and lam in the middle which looks kind of like this:

      c
     \ /
      \/
      /\
      \/
    Figure 2

    Note that this is different than the case as
    illustrated in Figure 3 where the hamza is directly
    above the alef and not "in between" lam and alef.

    c
     \ /
      \/
      /\
      \/
    Figure 3

    So there is a subtle difference that the hamza is not
    directly above the alef but rather in between the alef
    and the lam. I am attaching a small gif file named
    "Sample.gif" that will demostrate the subtle
    difference of the positioning of the hamza. Attached
    are two words from the Quran. Look for the second word
    where the hamza is in between the alef and the lam
    instead of directly above the alef.

    When we encode this case with this combination of
    Unicode characters: 0644-0627-0621
    in Internet Explorer, instead of showing it like
    Figure 2, it totally seperates all letters and shows
    it like this:

    | |
    | |
    | C \__/

    which is totally wrong.

    Which one do you think is the problem here?

    1) We are not encoding this combination of characters
    in the correct way.
    2) This is a font-related problem.
    3) This is a bigger problem for which the rendering
    engine on the operating system has to be modified.

    Thank you very very much,
    Mete Kural



    Sample.gif

    This archive was generated by hypermail 2.1.5 : Fri Feb 28 2003 - 13:42:41 EST