Re: pre-HTML5 and the BOM from Leif Halvard Silli on 2012-07-18 (Unicode Mail List Archive)

From: Leif Halvard Silli <xn--mlform-iua_at_xn--mlform-iua.no>
Date: Wed, 18 Jul 2012 14:19:46 +0200

Steven Atreju, Wed, 18 Jul 2012 13:40:30 +0200:
> Except that the internet is almost unusable without cookies
> and scripting, lynx(1) works very well, too, if the ncursesw
> library is linked against (and the terminal font supports
> Unicode characters). Funny that it writes garbage for
>
> |<html><body><p>ä.ü.ö.</p></body></html>
>
> but uses UTF-8 by default for
>
> |<html><body><p>ä.ü.ö.</p></body></html>

Wow, a command line tool that breaks with all you have said about Unix
tools, no? :-)

It would be perfectly in line with HTML5 if Lynx, with or without
linking against ncurses, sniffed the first, BOM-less instance correctly
too. However, so far, Chrome seems like the only browser to do so by
default.

> Hypertext offers a lot of possibilities to declare the charset,
> and until then an agnostic 8-bit parser will do fine except
> for multioctet charsets.

One should perhaps not care about bugs ... But for Lynx, in the version
I checked last (probably not linked to ncurses), then it did not
understand HTML5's new <meta charset="FOO"> any better than it
understood the BOM. It only understood <meta http-equiv=Content-Type
content=FOO>. So, since dropping the new <meta> element is not really
an option, then, to always also the HTTP header on the server, is the
absolutely safest thing ...

-- 
Leif H Silli

Received on Wed Jul 18 2012 - 07:22:06 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 18 2012 - 07:22:07 CDT