RE: Multilingual Documents [was: HTML forms and UTF-8]

From: Chris Pratley (
Date: Mon Nov 22 1999 - 20:59:13 EST

Just to clarify, please note that I did not mean to limit my definition of
"multilingual document" to side-by side translations or similar "dual
language" documents. The customer research we did showed that true
"multilingual documents" are far in the minority, whether they are side by
side translations or simply include a single word from another language. The
one exception is when the author's name is included, since people's names
are often "foreign" to the language of the document they are authoring. In
fact, the necessity of supporting the author's name in different scripts in
both the document and the meta properties of the document were major
requests in early versions of Word. (e.g. Greek for EC documents)

However, the point stands that in the real world, documents that mix small
or large amounts of two or more languages are relatively rare compared to
those that don't. This statement makes no comment on the absolute numbers of
both types of documents (they both number somewhere in the hundreds of
thousands to many millions, but the ratio is somewhere in the order of 10-1
to 100-1 in favour of monolingual if you ignore author's name). And of
course there will always be anecdotal evidence that people do anything -
extrapolating from anecdotes is a dangerous game. I myself use Japanese
occasionally in my English documents but I know from our own research that
my usage is (again relatively speaking) quite rare outside Japan (yes,
schools and students and Japanese people living abroad do it, but these are
relatively small groups).

One relatively common "multilingual" case is mixing English (actually, Latin
script) with non-Latin script text (e.g. Greek, Russian, Japanese, Chinese,
Thai). Multilingual documents are still rare in those markets, but
multi-script (i.e. something plus Latin) are very common in areas where the
native language is not using Latin script. Whether names like "Microsoft",
or "Xerox" are any particular language is up to interpretation, but they are
definitely Latin script when written this way, and it is usually words of
this type that are the ones written in Latin script.

Chris Pratley
Lead Program Manager
Microsoft Office

-----Original Message-----
From: Martin Heijdra []
Sent: November 22, 1999 12:20 PM
To: Unicode List
Subject: Re: Multilingual Documents [was: HTML forms and UTF-8]

Actually, the fact that most texts are primarily in one language AND the
for multilingual capability are not necessarily in opposition. The vast
majority of documents I want to put on the Web are *written* in one language
(mainly, English) BUT with small parts (titles etc.) in other script
namely the various parts of CJK. Currently we still constantly have to use
gifs etc. to get the characters and all different diacritics right, because
the limitation of one encoding per page. Only when a reasonably full Unicode
capability and/or font can safely be assumed to be present (rather than some
having Chinese, some having Japanese, some having pan-European, some having
that) will the need for such awful gif and picture-solution for texts go

Becker, Joseph wrote:
> What Chris says matches the results of market studies we (Xerox) did on
> multilingual systems. It is globalization and connectivity that create
> value of one-world architecture; multilingual documents are a pleasant
> for those of us who need or enjoy them.
> Joe

Martin Heijdra
Gest Oriental Library
317 Palmer Hall
Princeton, NJ 08544

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT