?: Word97: Preliminary Experimental Results

From: Chris Pratley (chrispr@MICROSOFT.com)
Date: Fri Feb 07 1997 - 19:59:28 EST


Being one of the "valiant people" you mention, I'd like to help by
explaining a bit about how Word97 handles multilingual support.

First of all, you are right that it is not easy to find information on
supporting Asian languages in US Word97. This is an oversight in the help
system. This happened since we did not go out of our way to make sure this
feature was discoverable for individual users because the vast majority of
our individual users never need to do this sort of thing. It is primarily a
corporate feature for multinationals that need to share documents
worldwide. However, the language packs in the "valupack" on the Office97 CD
do provide all the necessary support for viewing Asian language documents.

You are right that US Word97 has a default font for Asian text set to Times
New Roman. This is done to enable certain internal processes and document
exchange scenarios, and is not a problem usually. Any Asian text typed into
US Word97 (running on an Asian version of Windows) picks up the correct
font, and of course any existing documents have the correct fonts applied.
Naturally the Asian versions of Word have the appropriate default fonts
set, although for example in Japanese Word, unmarked Chinese text will
adopt a Japanese font. For HTML, since there is no way to be sure a
particular font contains a glyph for a particular character, we do not
attempt to apply a particular font based on the Unicode code point. Given
that there is no FONT FACE tag in this page, it is difficult to guess what
writing system a particular range of Unicode text represents especially for
unified ranges. e.g. Which "style" of Chinese character and therefore which
country's font should be used for a range of CJK text? So we can't do the
right thing here either in absence of the LANG attribute. Leveraging a LANG
attribute or determining the correct language heuristically in its absence
and having some sort of language->font mapping is a feature we are
considering. As you noted, the "fix" is to simply apply correct fonts to
the text and the problem is solved.

Once you have CJK text in a "Far East" font that can display it, we do not
allow you to apply a font such as Arial that we are certain does not
support that range. This is a feature that enables you to select a range of
mixed roman and CJK text and set a "Far East" font which affects the CJK
characters and any "ASCII" range characters since these are supported in
basically all fonts. Then you can set an "ANSI" font which applies to only
the ANSI text to prevent you from seeing boxes. Without this, when changing
fonts for a large section of text you would be forced to select each run of
roman text individually in order to not corrupt the display of the Asian
text. Similarly, any extended "European" characters will never adopt a Far
East font since many of the Far East fonts do not contain the correct
glyphs. Many of these fonts actually claim to support these ranges when
queried, but in fact don't.

The font dropdown displays the font that is currently being used where your
insertion point is. I think you will find that if you click in the middle
of some Asian text, the Font will show the font you expect (i.e. what is
used to display those characters), even if you had just applied Arial to
that range. Perhaps when you tested this you had the insertion point in
some roman text?

A little note about the "JPNSUPP.EXE" and other 8.3 filenames. ISO-9660 CD
ROM file format standard does not allow long filenames. Given the choice we
would prefer a descriptive name.

I'd also like to comment on why these support packs are not included as an
option in the Setup program. You are probably aware that ironically MS is
criticized for including too much software (i.e. value) in MS Office which
is viewed as a bad thing in some people's minds. The press complains that a
Complete install takes up almost 200MB, and they are not willing to
consider the custom setup options or do a Typical install. To improve this
image, we are forced to actually remove features from the setup entirely so
that magic "maximum install size" number that the press sensationalizes
will come down. If we were to include the Asian font support even as a
custom install option, we would add another 10-15MB (uncompressed size) to
the size of the maximum install, which we considered unacceptable when
compared to the likely usage of this feature. Naturally, I am sad that we
could not include them, but that is the harsh reality.

Finally, regarding your system stability, could you provide me with details
of your environment (Windows version, Service Pack level, etc.)? We have
tested this pretty extensively and haven't had trouble with things locking
up due to multiple large fonts. 32MB of RAM is plenty - no need to increase
that.

Thanks,
Chris

----------
From: unicode@Unicode.ORG
Sent: Thursday, February 06, 1997 6:31:39 PM
To: unicode@Unicode.ORG
Subject: Word97: Preliminary Experimental Results
Auto forwarded by a Rule

Well, I went out and bought Office97, mostly in order to have a
Unicode-based Word.

I've installed all the "Far East" language packs, and have discovered a
couple of odd things. First, if I open everyone's favorite test page,

http://www.cm.spyglass.com/unicode/iuc10/x-utf8.html

I get missing-char boxes where the kanji and hanzi should be. Not too
surprising since the new direct HTML support probably uses some "HTML
default fonts" (one for proportional, one for fixed) defined somewhere,
and those fonts are probably 8859-1.

When I select a some CJK text and choose one of the Asian fonts, it
works as I would expect (well, *after* I remind myself again that kana
chars *are* included in the Chinese charsets, and that the JIS charset
does include the traditional 'country' (kuni, guo2, kuk) char!), but if
I change to other Asian fonts to see how they look, my machine locks up
after a couple of changes.

My assumption is that Word97 is trying to hold all of these enormous
fonts simultaneously in memory and choking on them. It locks up tight,
and I'm forced to control-alt-delete the app to get out of it. It
appears as though working with multiple Asian languages means 1) save
your work often, 2) "multilingual" doesn't necessarily mean "at the same
time", and 3) choose your favorite font and stick with it. It may also
mean getting more than the apparently wimpy 32MB of RAM that I have.

Oddly, after changing to a double-byte font, if I change the font to
plain ol' Arial (sometime before my quarter runs out and the game locks
up ;-) ), the CJK chars remain CJK. They don't go back to being
missing-char boxes. They do seem to change shape a bit, though, but the
font drop-down menu above claims they are Arial, which is obviously
wrong, so I don't know what font they really are.

I was hoping to solve some of the mysteries surrounding Office97's new
unicode support, so I asked the talking paperclip (the new help agent)
what he could tell me about unicode support. He hadn't heard of unicode.
I went into the help index manually, but "uninstall" is all it could
suggest in answer to "unicode". Hmmm... Nowhere in the help system, or
the printed manual, that I can find, is unicode mentioned (although, of
course, I may have missed it.)

I tried asking the paperclip about Japanese. He recognized that and
cheerfully took me to a page showing how to install "multilingual
support" which, it said, means I can now display docs written in "any
European language." I eventually found one sentence that says you can
display "Far Eastern" languages if you add support for them, but it
doesn't say what that means or how to add such support. It's not a part
of the standard installation, nor is it included among the huge number
of options when doing a custom install. There doesn't appear to be
anything in any readme doc that I've found telling users how to set up
CJK support. Because I knew it was there somewhere, thanks to Lori and
Murray on this mailing list, I eventually found the files in the "Far
East" directory, in the "ValuePack" directory. No readme in the
directory, though, just executables with cryptic names like chtsupp.exe
(presumably Chinese Traditional Support--MS likes those 8.3 filenames.)
Not a feature anyone is going to accidentally discover or easily figure
out how to use if they do. The installer .exe's want you to reboot your
machine between each language pack, too. (I thought I could ignore it
and reboot once at the end, but I was punished....) It would have been
nicer if they had been listed in English as options during a custom
install.

I'll get in and crawl around with a hex editor and figure things out for
myself, since that's the sort of thing I do for a living, but this is
still an implementation more suited to CS majors than Japanese majors.

I'm extremely pleased to see the direction Microsoft is going with
multilingual support, but this has all the appearances of a valiant
behind-the-scenes effort by a few wonderful people trying to do the
right thing amongst a sea of developers and marketers with "more
important" things on their minds than meeting the needs of the eccentric
multilingual fringe. ;-)

Keep up the fight,
__Glen Perkins__



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT