A few comments on these html files and Word97's capabilities.
Word97 supports UCS2 (little-endian) for textfiles
Word97 supports UTF-8 for HTML (but not UCS2)
This is why Word opens the true UTF-8 sites such as
http://www.cm.spyglass.com/unicode/iuc10/x-utf8.html
as Web pages, and the UCS2 little-endian pages as plain text.
Our assumption was that UTF-8 was the only Web-safe encoding that was
reasonably likely to be adopted by browsers in the near future. Is that
the consensus, or are raw UCS2 encodings being considered actively by
people on this alias?
Word97 will not open big-endian UCS2:
http://194.75.134.50/unicode/iuc10/x-ucs2.html
These are treated as text files by Word97, since it does not support
parsing UCS2 HTML
http://www.lang.duke.edu/unichtm/unilang.htm
http://194.75.134.50/unicode/iuc10/x-ucs2l.html
Also, it is interesting to note that
http://194.75.134.50/unicode/iuc10/x-ucs2l.html
contains a META tag claiming the file is UTF-8, although of course it is
not. This is one of the dangers of using META tags, or of changing
encodings of existing files without handling META tags, depending on
your viewpoint.
I'd be interested in the repro steps and environment that led to the
PPT97 paste via clipboard failing. I have no trouble doing this on my
Japanese NT4/Word97 setup.
Chris
-----Original Message-----
From: Lori Brownell
FYI
-----Original Message-----
From: [SMTP:becker.osbu_north@xerox.com]
Sent: 4 tebp`k 1997 c. 11:54
To: Lori Brownell; charles.wicksteed@reuters.com;
misha.wolf@reuters.com
Cc: unicode@unicode.org; becker.osbu_north@xerox.com
Subject: Re: Translated IUC10 Web pages:
Experimental Results
Thank you all, we're clearly well on the road though not
yet arrived. Here are
a few observations with NT 4.0 and Office 97, using the
Bitstream Cyberbit font
handed out at IUC9:
Charles> I have added ...
Charles> http://194.75.134.50/unicode/iuc10/x-ucs2l.html
Charles> (UCS-2, least significant byte first,
MicrosoFFFE)
Thank you for going to this trouble, my first
experiences with this are:
o Netscape 3.0 loads the page, shows the first
couple dozen characters (as
ASCII/garbage); attempting to download it, Netscape
similarly truncates the
file very early
o MS IE 3.0 cannot open the page
o Word 97 opens it (via the procedure below) as
correct Unicode plaintext
HTML source
o Word 97 Save As ... Unicode Text correctly
writes this as a
MicrosoFFFE file that can e.g. be read by NT Notepad
o Clipboard copy/paste to NT Notepad also works
o Clipboard paste to PowerPoint 97 is rejected
("error")
Charles> http://194.75.134.50/unicode/iuc10/x-ucs2.html
Charles> (UCS-2, most significant byte first)
o Word 97 opens the first several lines as correct
plaintext HTML source,
then starts a huge stream of random bytes right in the
middle of the first
<img> tag, namely after "... <img a" (i.e. it goes
bonkers after the "a" in
"alt")
Chris> Select this URL below
Chris>
http://www.cm.spyglass.com/unicode/iuc10/x-utf8.html
Chris> Edit/Copy
Chris> File/Open (in Word97)
Chris> Paste into the filename box
Chris> OK
This works beautifully, thank you! Word 97 Save As ...
Unicode Text also
correctly writes this as a MicrosoFFFE text file, thus
providing perhaps the
simplest path to extract all the text back out of this
page.
I also tried these Unicode multilingual sample pages:
http://www.lang.duke.edu/unichtm/unilang8.htm --
presence/absence of BOM
unknown
o Netscape 3.0 (with Registry hack) loads the page
fine
o Clipboard copy/paste to NT Notepad treats text
as ASCII, i.e.
high-order characters garbaged
o Word 97 opens the page as ASCII, high-order
characters garbaged
http://www.lang.duke.edu/unichtm/unilang.htm --
little-endian UCS-2,
presence/absence of BOM unknown
o Word 97 opens the page correctly
Joe
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT