Re: New to Unicode

From: Michael Hall (
Date: Mon Jul 24 2006 - 18:54:17 CDT

  • Next message: Philippe Verdy: "Re: Proposal to encode an EXTERNAL LINK symbol in the BMP"

    Actually, I have moved from this scheme: (en)

    to this scheme: (en)

    We simply wanted a clean break between which languages were on which
    (sub)domains, we believe the search engines weren't doing a good job of
    figuring out which language our site was in (it was in 4 languages, now
    7). Only the english domain was really being indexed properly, even
    though our non-english pages were correctly identified for language and
    encoding. With the same content and far smaller result sets in
    non-english languages, I would have expected we would do better in
    non-english than english. So anyway, we are now trying one language/one

    This whole area is very murky and it is very difficult to find reliable
    information on how search engines correlate information including domain
    names, a document's self-identified encoding/language, and the IP
    address of the server (geolocation) to eventually categorise a site by
    country/language and give weight to its content.

    Organising local domains (.it etc) involves additional expense and
    issues such as requiring the domain registrant to be a local resident etc.

    I took my lead on the language/country codes from what Google uses in
    its URLs for non-english languages. To my mind, what the subdomains are
    called doesn't matter so much. In fact, I would have gone for
    descriptive subdomain names like: (it) (fr) (de)

    except that it would break down with CJK languages in particular. I
    don't know where multilingual URLs are at right now, but we aren't in a
    position to play on the bleeding edge.

    I don't think users remembering the domain name is an issue in our case.
    We are more interested in users finding our sites in Google and Baidu
    than remembering the domain name. The client sells tours and doesn't get
      return business, not because the tours are no good, it's just the
    nature of the industry!

    Regarding the zh_CN/zh_TW/zh_HK issue, we have to take into account the
    fact that the vastly bigger market is zh_CN, so if it is necessary to be
      more specific than zh, we'll go for zh_CN.

    >>>I am developing a multilingual website. After considering various
    >>>options, I've gone with a subdomain for each language
    >>Just as a side note, the standard language codes for Japanese and Korean
    >>are JA and KO, not JP and KR.
    > For subdomain names, he can choose whatever codes he likes and wants
    > within his own domain domain name, this has no impact on the applications.
    > Note that domains in the TLDs use country codes, not language codes, so for
    > consistency (if he also applied for ccTLD domains) he mayt have simplified
    > his setting, so that people can connect either to or
    > or even indifferently
    > (his setting depends on the way the werver maps domain names to actual
    > resources, and it may be inconvenient to use different codes for ccTLDs
    > and subdomains.)

    > The good question is then about which URL his users will remember more
    > easily, and which one he wants to advertize and make available in his server
    > configuration. If he wants to target Japanese users, they are used to see "jp"
    > in domain names, so it seems logical to use "" (or
    > if available, or another localized domain name in ".jp"), as many users will
    > expect "jp" and not "ja" in URLs. country codes are much more wellknown than
    > language codes.

    This archive was generated by hypermail 2.1.5 : Mon Jul 24 2006 - 19:27:39 CDT