RE: http://www.reuters.com/unicode/iuc10/x-utf8.html

From: Chris Pratley (chrispr@microsoft.com)
Date: Mon Jul 12 1999 - 22:36:20 EDT


You can use Microsoft Word2000 to edit/create UTF-8 HTML. HTML generated by
Word2000 has no connection to that created by Word95 and Word97, both of
which used an external add-in for HTML support. Word2000's native HTML and
CSS are completely re-written and are fully standards based. Plus you can
download the Office HTML filter from the Office Update web site to strip out
everything in Word's HTML except strictly presentation HTML/CSS, which gives
you a smaller file size and very clean HTML at the cost of full round-trip
fidelity in Word. Naturally, you can set the encoding of the HTML to be
UTF-8 or almost any other encoding (even UCS-2 if you like).

Of course, if you want to edit the HTML directly without changing it at all
you can open the file as plain UTF-8 text in Word2000 and edit it that way.

FrontPage2000 also supports UTF-8 HTML generation, and one of its primary
features over past versions is that it does not modify the HTML in the file
except in the locations that you change while editing it.

If you find any syntax errors in HTML generated by Word2000 or FrontPage2000
I would very much like to hear about them. (One caveat: there may be a few
instances of odd-looking HTML included by design to work around particularly
egregious bugs in popular browsers)

Chris Pratley
Program Manager
Microsoft Office

-----Original Message-----
From: Otto Stolz [mailto:Otto.Stolz@uni-konstanz.de]
Sent: Rabie I 29, 1420 4:52 AM
To: Unicode List
Cc: misha.wolf@reuters.com; Gabe Bokor
Subject: Re: http://www.reuters.com/unicode/iuc10/x-utf8.html

Gabe Bokor has written:
> While your page provides the answer to my question in theory,

Add a doctype declaration to the theory,
cf. <http://www.w3.org/TR/REC-html40/struct/global.html#h-7.2>

You may also wish to read other parts of the HTML 4.0 specification,
and hints for HTML authors:
  <http://www.w3.org/TR/REC-html40/>
  <http://www.w3.org/WAI/GL/#Current_Draft>
and to test your HTML source against pertinent validation services:
  <http://validator.w3.org/>
  <http://www.cast.org/bobby/>

> I still don't know how to generate those Unicode pages in a regular word
> processor.

I have tried the HTML assistants from two Word versions (with German texts
in CP 1252), and both of them have generated absolutely inacceptable
HTML versions. Inacceptable meaning: containing blatant HTML syntax errors
and a pletora of proprietary Microsoft features. Hence, I advice you *not*
to let Word generate the HTML source. I haven't tried any other word
processor, though.

You could edit your HTML source with UniEdit and store it in UTF-8,
cf. <http://www.lang.duke.edu/uniintro.htm>.

You may wish to try Tango Creator, the Unicode-capable HTML editor from
Alis, (which I haven't yet),
cf. <http://www.alis.com/internet_products/creator/creator.html>.

Best wishes,
   Otto Stolz



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT