New to Unicode

From: Michael Hall (
Date: Thu Jul 20 2006 - 23:50:35 CDT

  • Next message: Andreas Prilop: "Re: New to Unicode"

    I am developing a multilingual website. After considering various
    options, I've gone with a subdomain for each language
    (IT,FR,DE,JP,KR,ZH). It is not as much work as it sounds as there is a
    lot of shared PHP code, lots of language stored in arrays in UTF-8 text
    files, and symbolic linking etc. I'm working on Fedora 3 and hosting on
    Linux/Apache platform.

    My first question is that while UTF-8 encoding seems to be working fine
    for all languages at the moment, am I heading for trouble with CJK
    languages in particular? Is Unicode really viable for websites in CJK

    I realise that members of this list will probably be more upbeat about
    Unicode than some, but at the end of the day my client just wants as
    many users as posible to see and read his webpages. That means Internet
    Explorer of course (I'm using Firefox for development).

    Also, we're interested in search engines picking up and indexing the
    text. Particularly Google and Baidu. Is UTF-8 a good choice for this?

    One final question if I may. Does anyone know whether search engines
    make any sense out of text encoded as character entities? The browser
    certainly does, but I'm wondering what a search engine sees. Do they
    interpret the entities or just see numbers?



    This archive was generated by hypermail 2.1.5 : Fri Jul 21 2006 - 10:48:26 CDT