Re: HTML5 encodings

From: Christoph Päper (
Date: Tue Dec 22 2009 - 02:34:06 CST

  • Next message: Otto Stolz: "Re: Medievalist ligature character in the PUA"

    > The question of charsets is really the least complex one to handle
    > in a browser,

    Using one right is not a problem, choosing the right one is.

    On the Web there is a number of different places to declare an
    encoding, contradicting defaults, byte-identic standards, supersets
    and subsets, inflexible server software, differing database backends,
    misleading coding software, bad advice, unskillful developers, copy
    and paste solutions, workarounds, hacks for minority scripts, old
    unmaintained content, mashes of applications, heuristics close to
    magic and people as stupid as ever.

    Ever wondered why browsers still provide the possibility to select an
    encoding manually?

    > by violating rules that were validated and tested for XML and past
    > versions, the new prohibition will just create more problems than
    > what it will solve, because it simply violates the intended target
    > which was "compatibility with legacy applications"

    Do you know any content currently on the Web encoded in a way
    prohibited by HTML5?

    If encodings are currently unused or unsupported, provide only
    insignificant advantages but potentially significant disadvantages
    (esp. security-wise), it can be a sound choice to not use them

    > because C1 controls were in theory forbidden in HTML and XML...
    > except the NEXT LINE control inherited from EBCDIC and mapped at
    > 0x85 in ISO-8859-1 and part of compressible whitespaces and of line
    > separators,

    Where do you get that from?

    > Is HTML5 already a dead standard,

    Quite the contrary, XHTML2 is dead. As much as I would love a lean,
    modular, systematic, well-designed text and application markup
    language for the Web (which XHTML1 is not and XHTML2 would not have
    been either) in theory, the pragmatic course taken by WHATWG is prone
    to succeed in practice.

    HTML5 has an "XML serialization" by the way.

    > in fact the battle is not there: it is in the evolution of
    > stylesheets, i.e. CSS3 where we should be more interested to have
    > it support a better typography.

    Typography (i.e. styling the 'inscription' itself) is not the main
    focus of Level 3, though. Most of it is done in but two modules: CSS3
    Text and CSS3 Fonts. <> <http://> (Editor's Drafts)

    > What I really hope is that browser will prefer violating the stupid
    > HTML5 rules,

    Do you know who established and who funds WHATWG? (Google is now a
    browser maker, too.)

    > Who suggested these violation rules? All seems to indicate Microsoft,

    They partially were suggested because of, but not by MS.

    > as it really looks inspired by existing standard violations found
    > in IE,

    Indeed, in the real world it is often vital to mimic even the
    failures of the top dog to stay alive, but even if surpassing it at
    some place in time there hardly is a path back. This largely has
    already happened in the browser world (including the handling of
    character encoding), but is now publicly documented in HTML5. This
    actually makes it easier for new players to catch up.

    > The more I read the HTML5 proposal, the more I see problems in it.

    It's an open and not a finished spec, you know.

    > The violations adopted on purpose are really a big hint to alert
    > others: don't use it, keep HTML4 or go directly to XHTML.

    Ouch, you really have no idea what you are talking about here.
    Besides, the encoding issue is really not that important as basically
    everyone can be assumed to be using UTF-8 now or soon.

    This archive was generated by hypermail 2.1.5 : Tue Dec 22 2009 - 02:35:35 CST