Re: BBC.co.uk languages - mostly not UTF-8

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Apr 12 2009 - 21:31:05 CDT

  • Next message: Bjoern Hoehrmann: "Flexible and Economical UTF-8 Decoder"

    Their policy seems to be that they support the "native" windows code
    page by default (with 8859-1 being a subset of 1252 - interestingly
    there's no use of 8859-2). Most (all) of the languages in the list that
    use utf-8 have no native code page.

    The really interesting thing would be to scan the text on the entire
    site to see whether it contains ncrs that go outside the given character
    set. If they to to any significant extent, that would mean that the
    receiving system would need to support Unicode at some level, and that
    they might as well have gone to utf-8.

    One would think, as long as the text repertoire is limited to what the
    early OSs could handle, it wouldn't matter whether you choose utf-8 or
    the native code page. The support for recognizing and converting from
    utf-8 came pretty early for browsers, I think. For example, I think
    Unicode support goes back as far as IE 3.0 - but I'm not (no longer)
    familiar with early versions of the other browsers.

    Is there anyone (still) familiar with older browsers and OSs who can
    contribute which combination of OS/browser could not handle a Windows
    1252 repertoire say, that was utf-8 encoded. It would be interesting to
    find out whether there's any OS/Browser combination for which there's a
    reasonable expected remaining deployment level and that couldn't handle
    that level of utf-8. OSs by themselves, there still many out there where
    the native code page is it, but the browsers that run on them don't have
    that limitation.

    I suspect that this choice by the BBC represents a historical
    development where no-one's seen the need to change anything as time
    marched on, except where languages essentially require utf-8 support
    because the vendors stopped supporting dedicated character sets.

    A./



    This archive was generated by hypermail 2.1.5 : Sun Apr 12 2009 - 21:35:33 CDT