Re: Filtering and displaying untrusted UTF-8

From: Andrew West (andrewcwest@gmail.com)
Date: Tue Dec 29 2009 - 05:17:10 CST

Next message: Phillips, Addison: "RE: HTML5 encodings (was: Re: BOCU patent)"

Previous message: Andrew West: "Re: Filtering and displaying untrusted UTF-8"
In reply to: Jason Schauberger: "Re: Filtering and displaying untrusted UTF-8"
Next in thread: Doug Ewell: "Re: Filtering and displaying untrusted UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

2009/12/28 Jason Schauberger <crossroads0000@googlemail.com>:
>
> I think it is much more consistent to offer an API call to get the
> current Unicode database version used and an easy way to update it
> when a new Unicode version is released, especially since AIUI most
> written and spoken languages are already represented in the current
> Unicode version. Hence, the possibilty of interchanged text becoming
> illegible due to completely new characters being filtered is rather
> slim.

That's an absurd statement. There is an extremely high possibility
that filtering out characters which are in the sender's version of
Unicode but are unassigned in the receiver's version of Unicode will
cause the text to lose its meaning if the sender uses any of these
characters, especially if they are characters belonging to a newly
encoded script (the latest version of Unicode added 15 new scrupts,
and there are many more new scripts in the pipeline).

Just because you personally think that the character repertoire of the
"current" version of Unicode is sufficient, does not mean that other
people will not need or want to use new characters. The next version
of Unicode will include hundreds of emoji and emoticon symbols, which
there is a good chance that many people will want to use, but if your
application is not updated in a timely manner, or if people use an old
version of it, these characters will be filtered out, and there is a
good chance that the resultant text will lose some of its meaning.

I have to agree with Asmus that filtering out unassigned characters is
a really bad idea.

Andrew

Next message: Phillips, Addison: "RE: HTML5 encodings (was: Re: BOCU patent)"
Previous message: Andrew West: "Re: Filtering and displaying untrusted UTF-8"
In reply to: Jason Schauberger: "Re: Filtering and displaying untrusted UTF-8"
Next in thread: Doug Ewell: "Re: Filtering and displaying untrusted UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Dec 29 2009 - 05:20:28 CST