RE: VOA- utf-8, lang="en" (Re: BBC.co.uk languages ...)

From: Don Osborn (dzo@bisharat.net)
Date: Tue Apr 14 2009 - 13:44:46 CDT

  • Next message: Jukka K. Korpela: "Re: Entering quotation marks (derives from Re: proposal for a "Standard-Exit" or "Namespace" character)"

    Thanks Mark, I can see why. I kow about how smaller sites can miss on this (and just recently mentioned this in regard to two Fula sites, one in Pulaar of Mauritania with no language designation on most pages and ar-SA on one section, and the other site based in Belgium listing lang="en-GB"). However I was a bit astonished to see that a major site like VOA appeared to have totally disregarded the issue (or else they consider the page frame in which the local content is situated to always be in English - but I see no lang= commands other than the "en" ones, so in any event they missed on adding proper language tags).

     

    Don

     

     

     

    From: Mark Davis [mailto:mark.edward.davis@gmail.com]
    Sent: Tuesday, April 14, 2009 1:14 PM
    To: Donald Z. Osborn
    Cc: A12n tech support; a12n-policy@bisharat.net; Unicode Mailing List
    Subject: Re: VOA- utf-8, lang="en" (Re: BBC.co.uk languages ...)

     

    FYI, in Google we essentially ignore the language setting in the web page, because it is too often missing or wrong to be useful.

    Mark

    On Tue, Apr 14, 2009 at 07:23, Donald Z. Osborn <dzo@bisharat.net> wrote:

    Thanks to all for the feedback on this topic. It sounds like the choice of utf-8 or not is mainly one of policy (or lack of same) and not technical restraints?

    Interesting on this point to contrast with VOA,* which has all of its language pages in utf-8.

    On the other hand, while BBC uses lang= parameter in page coding to indicate the main language in each page, VOA pages are apparently all lang="en"

    Like BBC, VOA ASCIIfies Hausa Boko orthography. It also has no text in Amharic or Tigrinya (among non-Latin scripts), only audio from an English language "Horn" page.

    Like BBC, it groups the similar languages Kinyarwanda and Kirundi on a single page (with text in one, the other, both, or something inbetween). It would be interesting to know what exactly is the language of the text content of that page. BBC codes their page "rw" (for Kinyarwanda), not "rn" (for Kirundi), even though both languages share it. But as already noted, VOA incorrectly uses lang="en" everywhere.

    * http://www.voa.gov (click on Languages) or
    http://www.voanews.com/english/screen_map.cfm

     



    This archive was generated by hypermail 2.1.5 : Tue Apr 14 2009 - 13:46:46 CDT