Re: Filtering and displaying untrusted UTF-8

From: Andrew West (
Date: Tue Dec 29 2009 - 05:17:10 CST

  • Next message: Phillips, Addison: "RE: HTML5 encodings (was: Re: BOCU patent)"

    2009/12/28 Jason Schauberger <>:
    > I think it is much more consistent to offer an API call to get the
    > current Unicode database version used and an easy way to update it
    > when a new Unicode version is released, especially since AIUI most
    > written and spoken languages are already represented in the current
    > Unicode version. Hence, the possibilty of interchanged text becoming
    > illegible due to completely new characters being filtered is rather
    > slim.

    That's an absurd statement. There is an extremely high possibility
    that filtering out characters which are in the sender's version of
    Unicode but are unassigned in the receiver's version of Unicode will
    cause the text to lose its meaning if the sender uses any of these
    characters, especially if they are characters belonging to a newly
    encoded script (the latest version of Unicode added 15 new scrupts,
    and there are many more new scripts in the pipeline).

    Just because you personally think that the character repertoire of the
    "current" version of Unicode is sufficient, does not mean that other
    people will not need or want to use new characters. The next version
    of Unicode will include hundreds of emoji and emoticon symbols, which
    there is a good chance that many people will want to use, but if your
    application is not updated in a timely manner, or if people use an old
    version of it, these characters will be filtered out, and there is a
    good chance that the resultant text will lose some of its meaning.

    I have to agree with Asmus that filtering out unassigned characters is
    a really bad idea.


    This archive was generated by hypermail 2.1.5 : Tue Dec 29 2009 - 05:20:28 CST