RE: Multilingual Documents [was: HTML forms and UTF-8]

From: Chris Pratley (
Date: Thu Dec 02 1999 - 03:10:07 EST

Good points. And actually, don't forget that Office has significant
server-side portions these days, including web forms and database reporting
pages generated from Unicode Access/Excel and so on with back ends tied to
Unicode SQL 7 or equivalent, so we're quite aware of those issues. In fact,
since so much cool stuff in this area (Unicode ActiveX and JavaScript) only
works in later, (coincidentally Unicode-capable) browsers, the data guys are
even more aggressive about using Unicode for the web solutions since if
users are willing to punt on ver 3 and older browsers (easier to do in a
corp intranet than on the internet), you can do a lot of amazing stuff - and
sending the data around in UTF-8 really simplifies things. Unfortunately,
you do get limited to IE for the "live" pages with ActiveX and Unicode forms
for now, but at least Unicode pages for viewing results can sort of work in
other "4.0" browsers today, and there's a migration path for when other
browsers can handle the Unicode forms too.

I don't mean to gloss over the issues that Glen brings up either, because
certainly getting a solution that works for everyone on the internet is
extremely difficult, but in a closed corporate environment where you can
legislate a certain level of support on the desktop, you can already see the
future and it is pretty cool (and Unicode-capable).

Addison writes:
If, say, MS-Word really had a "multilingual" requirement, it would
change spell-checkers and grammer-checkers when the language of the sentence
changed...... right?
>>>Actually, Word has done this since Word6 (FYI there are Microsoft
spell-checkers for 37 languages you can install to run simultaneously in the
background in Word). If configured for it, Word 2000 will even automatically
detect the language as you type so you don't have to worry about setting the
language property. In all this I hope I haven't given the impression that
Word and Office don't handle multilingual documents - they do, and pretty
well I think. The point was that this support has recently been riding a
globalization wave, rather than getting in the product solely on its own

Chris Pratley
Lead Program Manager
Microsoft Office

-----Original Message-----
From: Earthlink []
Sent: December 1, 1999 10:54 PM
To: Unicode List
Subject: Re: Multilingual Documents [was: HTML forms and UTF-8]

I agree with all of Chris' assertions below... when talking about
applications for single user systems.

For Web-based applications, though, I think I can see additional "business
justifications". For example, data is increasingly multilingual. Internet
driven applications will have data that is collected from or used across a
wide array of locales. If your site interacts with a variety of users (using
one-language-at-a-time HTML), you will still collect a lot of data in a
single repository that must be parsed, reported, analyzed, stored,
normalized, etc. with the original locale in mind [as well as the locale of
the system administrator, customer, site owner, and so forth]. A report on
e-commerce site activity that is generated "in German" can still contain
fields in another script. A few weeks ago I was looking at a shipping
system: orders from one country were placed with a company in a second
country... and fulfilled from warehouses in a third. I think you can see how
multiple scripts might be called for in such an instance.

In other words: you may collect it one script at a time, but you may have to
deliver it in collections.

Chris has the "easier" job of creating single-user-at-a-time applications
where, as he points out, most people create mono-lingual documents and where
most polylingual documents are really quoting small patches of another
language within them (so the ability to merely handle a few characters in a
row is enough--the user assumes responsibility for the non-primary-language
content). If, say, MS-Word really had a "multilingual" requirement, it would
change spell-checkers and grammer-checkers when the language of the sentence
changed...... right?

Almost everyone on this list is (of course?) in agreement that Unicode
support is a good thing and the sooner we get it everywhere the better. But
Internet applications have an abysmal record of internationalization, in
part because "Unicode is supposed to fix everything"... and characters are
only the part of the iceberg that sticks up. How many recent e-mail have we
seen on this list from programmers in which it is assumed that supporting
Unicode alone will fix all of their problems? They all proceed from the
false assumption that character sets are the *ONLY* issue with regard to

... the problem on the web is not so much multilingual documents, but the
ability to process data from different locales in the same server at the
same time. The justifications Chris gives are thus "writ large" for Internet
developers, most of whom appear to be focused on the far too homogenous US
market and have a serious problem waiting for them just around the corner.
Developers need to be aware that, unlike single user applications,
assumptions about locale are probably fatal to international Internet
products and thus competitiveness of their product/company [cf. InfoWorld
Vol 21, Issue 47, page 12 "Eye on e-business"]... in real time, for all the
world to see.



----- Original Message -----
From: Chris Pratley <>
To: Unicode List <>
Sent: Wednesday, December 01, 1999 8:07 PM
Subject: RE: Multilingual Documents [was: HTML forms and UTF-8]

I would classify all of Asmus's examples as what I referred to earlier as
"publications", i.e. a small subset of documents that are intended for mass
distribution to unspecific audiences. They don't represent a large number of
unique documents but they are distributed widely so they represent a
somewhat disproportionate share of physical instances of documents - which
is somewhat but not entirely irrelevant to the authoring frequency and
therefore the issue of support within applications. One could argue that
these wide distribution publications are more "valuable" to customers on a
per-document level than memos or letters, so there is some reason to weight
them more despite their relatively low frequency.

That said, I don't want to drone on about multilingual not being important,
since I personally don't feel that way (business is one thing, my interests
are another). I think the point I was trying to make on this thread was:

The need for explicitly multilingual documents is not as great as many of us
would wish. If any of us work in companies that make software that could be
more multilingual, my advice is to not push some imaginary great need for
multilingual support as the main reason that products should be made
multilingual. You risk being dismissed as having a lack of perspective on
the big picture of customer needs (as Andrea describes). I had much more
success within Microsoft for Office97 and 2000 by arguing that:
1. A single code base and executable that handles all customer languages is
easier for a development organization to maintain (Strictly speaking this
does not require multilingual support, but that helps a lot in internal
processes, testing, etc.)
2. The cost of shipping a product is not just that of the English version,
although it may seem that way to the "core" development team. By globalizing
the code, all the downstream (localization) costs are reduced, and quality
increases for non-English products. (~53% of sales by value for Office).
This argument is even better because it shows you are thinking with an even
bigger perspective than your detractors.
3. A single executable that handles all customer languages is much easier
for customers to handle (Large organizations in particular). This reduces
their cost of deployment and administration, and can be translated to real
business value for them.
4. A single executable implies a single patch for fixes rather than
language-specific patches, which further reduces customer hassle and cost.
5. The headache of identifying and handling the various multilingual
customer scenarios that actually do exist all go away if you use Unicode
rather than try to hack your product to support.
Customers have responded very favourably to these points and the resulting
product, so that helps in support of continuing the work in the next

I find it ironic that the biggest driver for multilingual support, and
therefore Unicode support, and thereby support for minority languages in
mainstream software, has been the needs of large "faceless multinational
corporations" - the same ones that are often vilified for trampling smaller
cultures. Funny how things seem to work out in the end.

Chris Pratley
Lead Program Manager
Microsoft Office

-----Original Message-----
From: Asmus Freytag []
Sent: December 1, 1999 5:46 PM
To: Unicode List
Subject: Re: Multilingual Documents [was: HTML forms and UTF-8]

Chris suggested that we on this list have a bias in favor of multilingual
documents. I think, we tend to have a reverse bias, overlooking the
existence of very everyday multilingual 'documents' that surround us. Even
in the solidly monolingual US.

Following suggestions from earlier postings I'll focus on those
multilingual documents that cross one of the technical boundaries that
non-Unicode systems erect.

1) My utility bills in Seattle are printed in these languages

- English, Spanish, Vietnamese, Chinese, Korean, Lao and Thai.
  (on the same bill).

2) Some of the foods (and other goods) I buy come with packages that contain

- English, European languages, even Arabic
  (These are not necessarily 'ethnic' foods, but the packages are
   intended for export. Extreme combinations of languages are more
   common in some markets, those that get the 'Rest of World' package
   as seen from the perspective of the producer)

3) Instruction and safety booklets come with almost any juxtaposition of

4) Most of my (European) newspapers easily cross alphabet boundaries
   (e.g. use of correct Latin-2 accents is common in Latin-1 languages).

I'm not listing the dictionaries, foreign language works, etc. that are all
specialized multilingual documents that I own because I am a member of this
list (or is it the other way around?) but ordinary everyday documents that
I did not particularily seek out for their multilingual nature. (This is
true even for (4).

Somebody has to produce all of these. For that purpose it would be enough to
have the translators use special purpose software. But just as with the
problems of German e-mail in a mixed 6/7/8 bit e-mail infrastructure, this
scenario runs into problems as one finds oneself reduced to manipulate
pictures of translations in the rest of the production process. And that is
how many of these things are done.

I appreciate Chris' attempt to not overstate the case for multilingual
support, but as we become more dependent on the web and net infrastructure
to handle all our text processing, the remaining bottlenecks do tend to
have a 'reverse synergy' type cost to them. My theory is that these
bottlenecks prevent some users from fully adopting the new technologies.
This 'cost' probably scales more with the percentage of users who have to
waorry about identifying and managing work arounds for them, even if
infrequently, rather than merely with the straight percentage of documents.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT