Re: Translated IUC10 Web pages: Experimental Results

Date: Tue Feb 04 1997 - 14:53:31 EST

Thank you all, we're clearly well on the road though not yet arrived. Here are
a few observations with NT 4.0 and Office 97, using the Bitstream Cyberbit font
handed out at IUC9:

Charles> I have added ...
Charles> (UCS-2, least significant byte first, MicrosoFFFE)

Thank you for going to this trouble, my first experiences with this are:

    o Netscape 3.0 loads the page, shows the first couple dozen characters (as
ASCII/garbage); attempting to download it, Netscape similarly truncates the
file very early

    o MS IE 3.0 cannot open the page

    o Word 97 opens it (via the procedure below) as correct Unicode plaintext
HTML source

        o Word 97 Save As ... Unicode Text correctly writes this as a
MicrosoFFFE file that can e.g. be read by NT Notepad

        o Clipboard copy/paste to NT Notepad also works

        o Clipboard paste to PowerPoint 97 is rejected ("error")

Charles> (UCS-2, most significant byte first)

    o Word 97 opens the first several lines as correct plaintext HTML source,
then starts a huge stream of random bytes right in the middle of the first
<img> tag, namely after "... <img a" (i.e. it goes bonkers after the "a" in

Chris> Select this URL below
Chris> Edit/Copy
Chris> File/Open (in Word97)
Chris> Paste into the filename box
Chris> OK

This works beautifully, thank you! Word 97 Save As ... Unicode Text also
correctly writes this as a MicrosoFFFE text file, thus providing perhaps the
simplest path to extract all the text back out of this page.

I also tried these Unicode multilingual sample pages: -- presence/absence of BOM

    o Netscape 3.0 (with Registry hack) loads the page fine

        o Clipboard copy/paste to NT Notepad treats text as ASCII, i.e.
high-order characters garbaged

    o Word 97 opens the page as ASCII, high-order characters garbaged -- little-endian UCS-2,
presence/absence of BOM unknown

    o Word 97 opens the page correctly


