Re: New to Unicode

From: Doug Ewell (
Date: Tue Jul 25 2006 - 02:00:53 CDT

  • Next message: Doug Ewell: "Re: New to Unicode"

    Michael Hall <info at mondoseo dot com> wrote:

    > (en)

    My point was that you are using a *mixture* of language codes and
    country codes, which is almost certainly not what you want to do in this
    context. "it", "fr", and "de" could be either country codes or language
    codes, but "kr" and "jp" can only be country codes (*), and "zh" can
    only be a language code.

    (*) Yes, I know about Kanuri. I don't think Michael is setting up a
    Kanuri-language site.

    Your template of is impeccable, but the choice of
    codes is not consistent. I would choose "ko" and "ja" (thus disagreeing
    with Philippe).

    > This whole area is very murky and it is very difficult to find
    > reliable information on how search engines correlate information
    > including domain names, a document's self-identified
    > encoding/language, and the IP address of the server (geolocation) to
    > eventually categorise a site by country/language and give weight to
    > its content.

    Usually, the only time you want to categorize Web sites by geographical
    location is when they truly have location-specific material. For
    example, you might be offering a store locator service that shows only
    the retail stores in a given country, or technical support where
    customers in different countries should call different phone numbers, or
    a product whose pricing and availability depends on the buyer's
    location. That is different from having a French-language site that is
    equally useful to French speakers in France, Canada, or Mauritius. This
    is what I meant by understanding which localization problem you are
    trying to solve.

    > Regarding the zh_CN/zh_TW/zh_HK issue, we have to take into account
    > the fact that the vastly bigger market is zh_CN, so if it is necessary
    > to be more specific than zh, we'll go for zh_CN.

    Which is fine, but Peter's point was that you should use "zh-Hans" (or
    "zh_Hans" if necessary) rather than "zh-CN" or "zh_CN". The combination
    "zh-CN" literally means "Chinese as spoken in mainland China," but in
    fact that is not a useful distinction: are we talking about Mandarin,
    Cantonese, Wu, or what? All are spoken in the PRC. If what you really
    want to convey is "Simplified Chinese, not Traditional" then use
    "zh-Hans" instead. Again, know what problem you are solving.

    Doug Ewell
    Fullerton, California, USA

    This archive was generated by hypermail 2.1.5 : Tue Jul 25 2006 - 02:04:42 CDT