From: Michael Hall (info@mondoseo.com)
Date: Thu Jul 20 2006 - 23:50:35 CDT
I am developing a multilingual website. After considering various
options, I've gone with a subdomain for each language
(IT,FR,DE,JP,KR,ZH). It is not as much work as it sounds as there is a
lot of shared PHP code, lots of language stored in arrays in UTF-8 text
files, and symbolic linking etc. I'm working on Fedora 3 and hosting on
Linux/Apache platform.
My first question is that while UTF-8 encoding seems to be working fine
for all languages at the moment, am I heading for trouble with CJK
languages in particular? Is Unicode really viable for websites in CJK
languages?
I realise that members of this list will probably be more upbeat about
Unicode than some, but at the end of the day my client just wants as
many users as posible to see and read his webpages. That means Internet
Explorer of course (I'm using Firefox for development).
Also, we're interested in search engines picking up and indexing the
text. Particularly Google and Baidu. Is UTF-8 a good choice for this?
One final question if I may. Does anyone know whether search engines
make any sense out of text encoded as character entities? The browser
certainly does, but I'm wondering what a search engine sees. Do they
interpret the entities or just see numbers?
Thanks
Mick
This archive was generated by hypermail 2.1.5 : Fri Jul 21 2006 - 10:48:26 CDT