Controls and the Like in Text Files (was: UTF-8 BOM and the real life)

From: Richard Wordingham <>
Date: Sun, 29 Jul 2012 10:51:22 +0100

On Sat, 28 Jul 2012 19:34:39 +0300
Eli Zaretskii <> wrote:

> > Almost nobody in the MS world uses the ^Z convention on purpose any
> > more; many don't even know about it.
> They might not use or even know this, but the C library does. And
> since the default open mode for ANSI C functions like fopen and
> Posix-like functions like _open is text, failure to open a binary file
> with O_BINARY resp. "rb" will cause the read operation to stop on the
> first byte whose value is 26.
> IOW, you don't need to know about this to be bitten by it.

And I'd thought the problem was due to using a compiler targeted at DOS.
But no, I'd still get the problem using MinGW on Windows 7 if I
performed the UCA & UCD 4.1.0 collation conformance test na´vely
assuming that the test input file CollationTest_SHIFTED.txt is to be
treated as a text file. At some stage in the past seven years, this
feature has been fixed, and the file itself no longer contains U+0026
even though U+0026 remains in the test strings it defines.

Of course, I might have misinterpreted the test. Perhaps on Windows one
only needed to compare the first 345 strings, not the full 123,088
strings :-)

Received on Sun Jul 29 2012 - 04:56:22 CDT

This archive was generated by hypermail 2.2.0 : Sun Jul 29 2012 - 04:56:39 CDT