Re: New to Unicode

From: Kenneth Whistler (
Date: Fri Jul 21 2006 - 14:19:19 CDT

  • Next message: Doug Ewell: "Re: New to Unicode"

    Mick Hall asked:

    > My first question is that while UTF-8 encoding seems to be working fine
    > for all languages at the moment, am I heading for trouble with CJK
    > languages in particular?


    > Is Unicode really viable for websites in CJK
    > languages?


    > Also, we're interested in search engines picking up and indexing the
    > text. Particularly Google and Baidu. Is UTF-8 a good choice for this?

    Yes. Particularly if your pages are all clearly labelled by charset
    and language.

    > One final question if I may. Does anyone know whether search engines
    > make any sense out of text encoded as character entities?

    They had better, or else they are processing HTML nonconformantly.

    Try it. I just did a google search of "Tällöin kustannusten" and turned up
    all kinds of Finnish pages -- some defaulting to 8859-1, some
    explicitly labelled 8859-1, some explicitly labelled UTF-8.
    Most of the 8859-1 pages simply use 8859-1 characters, but this one:

    labelled 8859-1, uses numeric entities for all non-ASCII

    It works fine, I think.


    This archive was generated by hypermail 2.1.5 : Fri Jul 21 2006 - 14:26:54 CDT