Re: (Informational only: UTF-8 BOM and the real life)

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Thu, 26 Jul 2012 00:45:25 +0300

2012-07-26 0:19, Steven Atreju wrote:

> |
>
> And that was an Unicode BOM that has been converted to UTF-8 and
> then been converted to UTF-8 once again.

Apparently the problem is that the data has been doubly encoded: first
into UTF-8, then interpreting the bytes of UTF-8 data, interpreting them
as if they were in windows-1252, and then UTF-8 encoding the resulting
characters. This is of course very incorrect, and not uncommon.

> |vielen Dank für Ihre E-Mail.

So the letter “ü” was munged too, and presumably all non-ASCII data. So
this is not an argument against using BOM in UTF-8. The BOM was a victim
of incorrect processing, like everyone else (outside ASCII). One might
even argue that the BOM is useful here, too, since it immediately
signals that there is something wrong, and “” is an encoding error
signature, so to say.

Yucca
Received on Wed Jul 25 2012 - 16:48:11 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 25 2012 - 16:48:12 CDT