Re: New to Unicode

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jul 21 2006 - 14:19:19 CDT

  • Next message: Doug Ewell: "Re: New to Unicode"

    Mick Hall asked:

    > My first question is that while UTF-8 encoding seems to be working fine
    > for all languages at the moment, am I heading for trouble with CJK
    > languages in particular?

    No.

    > Is Unicode really viable for websites in CJK
    > languages?

    Yes.

    > Also, we're interested in search engines picking up and indexing the
    > text. Particularly Google and Baidu. Is UTF-8 a good choice for this?

    Yes. Particularly if your pages are all clearly labelled by charset
    and language.

    >
    > One final question if I may. Does anyone know whether search engines
    > make any sense out of text encoded as character entities?

    They had better, or else they are processing HTML nonconformantly.

    Try it. I just did a google search of "Tällöin kustannusten" and turned up
    all kinds of Finnish pages -- some defaulting to 8859-1, some
    explicitly labelled 8859-1, some explicitly labelled UTF-8.
    Most of the 8859-1 pages simply use 8859-1 characters, but this one:

    http://www.mandinka.org/Public/FI/

    labelled 8859-1, uses numeric entities for all non-ASCII
    characters.

    It works fine, I think.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Jul 21 2006 - 14:26:54 CDT