RE: UTF-8 signature in web and email

From: David Starner (
Date: Tue May 22 2001 - 17:22:25 EDT

At 11:14 AM 05/22/2001 +0200, you wrote:
>But, also in this case, why should it be a problem to have ZWNBSP in
>whatever position in a file? Why should *this* character be more a problem
>that SPACE, or TAB, or CARRIAGE RETURN, or COMMA, or name it?

Because SPACE, TAB, CARRIAGE RETURN, or COMMA don't as a general rule sit
at the start of the file when they aren't supposed to be in the middle of
the combined file, and they don't get hidden if it is a problem.

>It only becomes a problem in the presence of one or more of these *bugs*:


>2) It is *forbidden* to have a ZWNBSP in the middle of the file;
>3) ZWNBSP is *displayed* incorrectly (e.g. a black box instead than "nothing
>at all");
>4) ZWNBSP is given an incorrect *semantic* value (e.g., a C compiler does
>not consider it as "white space").
>But, then, why blaming ZWNBSP? Fix the bug(s)!

You're asking for every program to treat UTF-8 specially. As of now, UTF-8 is
just one of many charsets in use on Unix. Until such point as it is the
charset, it's just another ASCII superset. Even then, this is a lot of work so
we can end up with ZWNBSPs sprinkled through files and the ability to guess
a file may be UTF-8 (or may not; something may have stuck a ZWNBSP where it
shouldn't be.) Considering we can already guess a file is UTF-8 because
LC_CTYPE=xx_XX.UTF-8, this isn't such a great win on Unix.

This will probably just end up as another CRLF/LF issue, requiring plain text
crossing from one system to another be changed.

David Starner -

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT