Re: [A12n-policy] Re: VOA- utf-8, lang="en" (Re: BBC.co.uk languages ...)

From: Mark Davis (mark.edward.davis@gmail.com)
Date: Tue Apr 14 2009 - 20:07:25 CDT

  • Next message: Hans Aberg: "Bytes and octets"

    What I'm saying is that with the current actual state of affairs in the
    world -- not what we wish it would be, but what it is -- then you can't
    depend on the content being tagged or tagged correctly for language. Thus if
    you are a consuming program for public web pages, and you care about
    (meaning that you process differently) languages X, Y, and Z, then you
    should be prepared to heuristically detect X, Y, and Z.

       - If you don't process language W (differently), you don't need to detect
       W, or
       - If you are working with a closed known set of pages where the language
       is known to be correctly tagged, you could depend on the tag instead of
       using detection.

    In my experience, for most programs that deeply care about the language of
    web pages (like text to speech processing, or Braille devices), heuristic
    language detection is a rather small amount of work compared to their main
    processing function. If I were buying such a product, and I expected it to
    work for my language, I'd certainly be disappointed if it didn't do that,
    since it wouldn't work on lots of pages.

    Mark

    On Tue, Apr 14, 2009 at 17:36, Andrew Cunningham <andrewc@vicnet.net.au>wrote:

    > Although WCAG 1,0 and WCAG 2,0 require language tagging.
    >
    > On Wed, April 15, 2009 5:39 am, Mark Davis wrote:
    > > It is a chicken & egg problem. Web page creators will only bother to set
    > > the
    > > language (or set it different than the default) if the language setting
    > > makes a difference. Because so much content is badly tagged, all of the
    > > interpreters of the pages end up having to disregard that information,
    > and
    > > compute the language heuristically ("language detection"). Because of
    > that
    > > the language setting doesn't make a difference, so the creators don't
    > > bother
    > > setting it.
    >
    >
    > Although the question becomes how many languages can you identify
    > heuristically?
    >
    >
    >
    > --
    > Andrew Cunningham
    > Research and Development Coordinator
    > Vicnet
    > State Library of Victoria
    > Australia
    >
    > andrewc@vicnet.net.au
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Apr 14 2009 - 20:10:31 CDT