Re: Filtering and displaying untrusted UTF-8

From: Andrew West (andrewcwest@gmail.com)
Date: Mon Dec 28 2009 - 05:42:13 CST

  • Next message: Doug Ewell: "Re: HTML5 encodings (was: Re: BOCU patent)"

    2009/12/28 Asmus Freytag <asmusf@ix.netcom.com>:
    >
    >>    For the rest, allow all ***assigned*** code points, filter unassigned.
    >
    > That's a fool's game, because assigned code points are version dependent.
    > Even if one could adopt a "supported version" for one's own code, nothing
    > guarantees that the codes were assigned at the time the originating software
    > was written. If not, they could represent data that wasn't really text in
    > the context it was created in. Further, the minute the next version of
    > Unicode comes along, this will prevent the software from handling perfectly
    > well-defined and standardized characters.
    >
    >> 3) For code points in planes 3 to 13 (unassigned planes) filter the
    >> complete range 0x30000 to 0xDFFFF.

    Asmus's comment also applies here. There is no guarantee that Planes
    3-13 will always remain unassigned, and in fact there is a very strong
    probablility that Plane 3 will be assigned before very long.

    Andrew



    This archive was generated by hypermail 2.1.5 : Mon Dec 28 2009 - 05:43:52 CST