RE: unicode on Linux

From: Francois Yergeau (
Date: Wed Oct 29 2003 - 09:59:46 CST

Philippe Verdy wrote:
> The idea that "if a text (without BOM) looks like valid
> UTF-8, then it is
> UTF-8; else it uses another legacy encoding" does not work in
> practice and also leads to too many false positives.

Can you point to actual data/cases? I don't mean theoretical, I can make up
my own.

> Some problems do
> exist however, with the relaxed rules for UTF-8 as it was
> defined in the IESG RFC.

Errr, relaxed? Care to elaborate? Are you referring to RFC 2279?

> These old texts (that are valid for this old
> version of the UTF-8 encoding) still exist now

What's particular about these old texts?


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST