Re: BBC.co.uk languages - mostly not UTF-8

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Apr 12 2009 - 21:31:05 CDT

Next message: Bjoern Hoehrmann: "Flexible and Economical UTF-8 Decoder"

Previous message: Don Osborn: "BBC.co.uk languages - mostly not UTF-8"
In reply to: Don Osborn: "BBC.co.uk languages - mostly not UTF-8"
Next in thread: Don Osborn: "FW: BBC.co.uk languages - mostly not UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Their policy seems to be that they support the "native" windows code
page by default (with 8859-1 being a subset of 1252 - interestingly
there's no use of 8859-2). Most (all) of the languages in the list that
use utf-8 have no native code page.

The really interesting thing would be to scan the text on the entire
site to see whether it contains ncrs that go outside the given character
set. If they to to any significant extent, that would mean that the
receiving system would need to support Unicode at some level, and that
they might as well have gone to utf-8.

One would think, as long as the text repertoire is limited to what the
early OSs could handle, it wouldn't matter whether you choose utf-8 or
the native code page. The support for recognizing and converting from
utf-8 came pretty early for browsers, I think. For example, I think
Unicode support goes back as far as IE 3.0 - but I'm not (no longer)
familiar with early versions of the other browsers.

Is there anyone (still) familiar with older browsers and OSs who can
contribute which combination of OS/browser could not handle a Windows
1252 repertoire say, that was utf-8 encoded. It would be interesting to
find out whether there's any OS/Browser combination for which there's a
reasonable expected remaining deployment level and that couldn't handle
that level of utf-8. OSs by themselves, there still many out there where
the native code page is it, but the browsers that run on them don't have
that limitation.

I suspect that this choice by the BBC represents a historical
development where no-one's seen the need to change anything as time
marched on, except where languages essentially require utf-8 support
because the vendors stopped supporting dedicated character sets.

A./

Next message: Bjoern Hoehrmann: "Flexible and Economical UTF-8 Decoder"
Previous message: Don Osborn: "BBC.co.uk languages - mostly not UTF-8"
In reply to: Don Osborn: "BBC.co.uk languages - mostly not UTF-8"
Next in thread: Don Osborn: "FW: BBC.co.uk languages - mostly not UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Apr 12 2009 - 21:35:33 CDT