RE: Unicode and end users

From: Chris Pratley (chrispr@microsoft.com)
Date: Wed Feb 20 2002 - 00:35:42 EST


Even better, use Word2002 and get all that, plus the ability to *edit*
the file and then save it back in any encoding, controlling CRLF/LF/CR
or whatever...

Actually, this problem of remembering encoding is not specific to
notepad - it happens for any text editor. The issue is that often the
encoding of a file needs to be preserved even when it is unclear at the
time of opening what the encoding of the file actually is.

For example, you may be editing a script file that is required to be
UTF-8 by some process that will later consume it. If it currently
contains only ASCII characters, when you open it, Notepad has to have a
default for the encoding to use. Note that in XP, Notepad does allow you
to Open As "ANSI", or three flavours of Unicode:
UTF-8/UTF-16LE/UTF-16BE. If it defaults to "ANSI", as users would expect
due to the number of legacy files out there, then when you later add
some non-ASCII characters, these will get saved as ANSI (if they are
available in the local codepage) rather than UTF-8, and your script will
not be interpreted correctly.

Of course, if you diligently remember to set the open and save encoding
whenever you edit, you will be OK, but it only takes one slip to cause a
very obscure bug. Using the "BOM" as a UTF-8 tag was a way to prevent
this from happening. Once saved as a UTF-8 file, that file would forever
afterwards be UTF-8 unless explicitly changed.

I didn't have anything to do with this decision for NotePad, but it
definitely makes things easier as long as you can assume that the
process consuming the file knows to throw away the BOM. Obviously, that
is the issue. :-)

Chris

Sent with OfficeXP on WindowsXP

-----Original Message-----
From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
Sent: February 18, 2002 3:54 PM
To: Lars Kristan; 'Asmus Freytag'; unicode@unicode.org
Subject: Re: Unicode and end users

Generally speaking, the best "reader" to do it all in is IE.... you can
open
the text file, change the encoding, and then copy/paste it out into any
other file.

Not that "Open as" wouldn't be cool (it would save me some steps!).

MichKa

Michael Kaplan
Trigeminal Software, Inc. -- http://www.trigeminal.com/

----- Original Message -----
From: "Lars Kristan" <lars.kristan@hermes.si>
To: "'Asmus Freytag'" <asmusf@ix.netcom.com>; <unicode@unicode.org>
Sent: Monday, February 18, 2002 3:23 PM
Subject: RE: Unicode and end users

> Asmus Freytag wrote:
> > Ever since MS let the cat out of the bag with notepad, the
> > rush is on for
> > all tools to be upgraded to handle the situation. Fine, this
> > is the real
> > world.
> *Sigh* yes, it is. I understand why notepad needs this. For notepad, a
file
> is either UTF-16 or an ANSI file. Since notepad keeps the internal
data in
> UTF-16 (just a fair assumption here), it needs to convert. And a UTF-8
BOM
> is what makes it use the UTF-8 conversion rather than an ANSI
conversion.
>
> Too bad this happened. Maybe someone at Microsoft should look into
this
> notepad a little bit more seriously. Was it Windows 4.0 or Windows
2000
that
> updated notepad so CTRL-F started working as everywhere else? In
either
> case, it took a long time for such a major improvement. I wish notepad
would
> handle LF files (as opposed to CRLF) correctly. I wish there was "Open
as"
> in the file open dialog, to allow opening OEM encoded files, maybe
even
> UTF-8 files without BOM...
>
>
> Lars Kristan
>
>



This archive was generated by hypermail 2.1.2 : Wed Feb 20 2002 - 00:10:44 EST