Re: Novice question

From: Antoine Leca (
Date: Tue Mar 23 2004 - 06:43:12 EST

  • Next message: Antoine Leca: "Re: Novice question"

    Hi John,

    John Snow va escriure:
    > I am speaking to a client regarding there website being translated in
    > to a number of languages including Bengali, Urdu and Punjabi which I
    > am told is not very well supported by Unicode.

    This is not true. These languages are supported by Unicode, since the first
    public version, more than 10 years ago.

    However, they use so-called complex scripts, which are not easy to "write"
    with a computer. The fact they are mainly used outside of the major
    industrial countries did not help on this respect, as you might guess. As a
    result, it is still quite difficult to have a readable web site written with
    these languages *and* viewable without impairment.

    Urdu is written with the so-called Arabic script, but in a way (`style')
    that is visually quite distinct from the Arabic you might have already seen
    (Urdu is written in Nastaliq style as opposed to Naskh style usually used
    for the Arabic language). This requires usually distincts fonts, which are
    harder to found and much less easily available than Naskh fonts. Of course,
    your potential clients will probably have the fonts, but the key point here
    is the potential diversity you may encounter. As I understand things (I did
    not test all the browsers that Edward indicated), the ultimate versions of
    the browsers should be able to work, provided the correct fonts are here
    (for example, Arial Unicode MS from Office XP is not sufficient here: it
    only has Naskh style). The other solutions outside Unicode (using the
    so-called "font hacks") are *not* going to give you better results, I would
    guess. And furthermore the future of such solutions is bad, since everyone
    is moving toward Unicode.

    Regarding Punjabi, this is probably the easiest. Depending on the country,
    Punjabi is written either with the Arabic script (this is called Shahmukhi;
    can used Naskh style) or in a dedicated script (called Gurmukhi), which is
    not as easy to write as Latin, but is not overwhelmy complex either. The key
    point here is this is not a #1 priority given the "market". As a result, I
    am positive recent version of IE is able to display it (with Arial Unicode
    MS as font). With other browsers, well, things are progressing, but there is
    still work to be done. Particularly on "alternative" OS such as Linux and
    likewise, since the OS is right now without native support for this script.
    On the other hand, MacOS has Gurmukhi support for many years, so I expect
    less problems on this platform (but I cannot be positive, my own Mac is far
    too old). In Gecko this is something which is under development; I did not
    check ultimately if it works. In Opera, it does work provided the OS
    supports it (in fact, in Opera in general there is nothing specific about
    any of these scripts, provided you are using Unicode and not some
    "font-hack", and provided of course there is OS support).
    So in general, pronostic about Punjabi is quite good (but keep in mind you
    may have two versions, one in Shahmukhi for "Pakistan", one in Gurmukhi for

    I left Bengali for the end, despite being among the 5 most spoken languages
    in the world. Bengali is much more complex to write. Font disponibility is
    scarse (for example, I do not have access to any of release quality).
    Microsoft is definitively working in this area, so things will get better on
    the Windows platform within the next years (alternatively, you can read it
    saying there is still work to be done). On Linux, there is a group of
    enthousiasts that are working hard to get results; due to the way things are
    set up, their results are likely to be reusable on MacOS too.
    Depending on the size of your project, going for "font hack" or other
    similar solutions such as iPlugin from CDAC might be better alternatives at
    short term. On the long term, of course, as above, Unicode is the correct

    Again, please note Unicode does not hold any "culprit": Unicode is a
    standard to encode texts. As such, it just works for these languages. And it
    is definitively the standard that all browsers do follow to interpret the
    "texts" they are sent. However, another thing is making this correctly
    visible on screen. And this is where you may encounter difficulties
    (particularly with Bengali).

    Hope this helps,


    This archive was generated by hypermail 2.1.5 : Tue Mar 23 2004 - 07:23:20 EST