Re: (Informational only: UTF-8 BOM and the real life)

From: John W Kennedy <jwkenne_at_attglobal.net>
Date: Sat, 28 Jul 2012 23:34:44 -0400

On Jul 28, 2012, at 11:52 AM, Doug Ewell <doug_at_ewellic.org> wrote:
> ^Z as an EOF marker for text files was part of the MS-DOS legacy from
> CP/M, where all files were written to a multiple of the disk block size
> (I think 128 for CP/M and 512 for MS-DOS 1.x), and there had to be some
> way to tell where the real text content ended. New stream-based I/O
> calls in MS-DOS 2.0 made this mechanism unnecessary. Unix systems had no
> legacy from CP/M, so they never had this problem.

Worse than that, actually. Actual MS-DOS APIs from 1.0 on were able to handle the situation, but the MS-DOS BASIC language and interpreter, with CP/M roots, assumed the 128-byte sector, and therefore demanded the ^Z. It was fixed as early as 1.1, I think, but the malady lingers on.

>> I.e., this is why we do have this messy text OR binary file I/O
>> distinction like O_BINARY (for open(2)), "b" (for fopen(3)) or
>> binmode (perl(1)). Because without those a text file will see
>> End-Of-File at the ^Z, not at the real end of the file.
>
> The reason for the text/binary distinction on DOS and Windows is
> conversion between Unix-standard LF and Windows (DOS, CP/M)-standard
> CRLF. It might be true that library calls to read a file in text mode
> will stop at ^Z, but Notepad and Wordpad don't. I know the library
> doesn't automatically write ^Z. Almost nobody in the MS world uses the
> ^Z convention on purpose any more; many don't even know about it.
>
>> (Which rises the immediate question why the Microsoft programmers did
>> not embed the meta information in this section at the end of the file.
>> But i don't really want to know.)
>
> See above. The intent of ^Z was never to distinguish data from metadata,
> as with the Mac data and resource forks.
>
> But of course none of this has anything to do with U+FEFF.
>
>> So do the programmers have to face the same conditions? I don't
>> really think so. They prefer driving plain text readers up the wall.
>> Successfully.
>
> Again, we don't really have this kind of evil intent, though it's often
> fun and convenient for people to imagine we do.
>
> --
> Doug Ewell | Thornton, Colorado, USA
> http://www.ewellic.org | @DougEwell
>

-- 
John W Kennedy
"Give up vows and dogmas, and fixed things, and you may grow like That. ...you may come to think a blow bad, because it hurts, and not because it humiliates.  You may come to think murder wrong, because it is violent, and not because it is unjust."
  -- G. K. Chesterton.  "The Ball and the Cross"
Received on Sat Jul 28 2012 - 22:40:48 CDT

This archive was generated by hypermail 2.2.0 : Sat Jul 28 2012 - 22:41:05 CDT