RE: Language Tagging And Unicode

From: Chris Pratley (chrispr@MICROSOFT.com)
Date: Thu Jan 20 2000 - 17:07:41 EST


I don't believe we have an explicit document on this. There exist statements
in our internal specifications, but I don't know of anything external on
this topic (I have never been asked either).

It's pretty simple. The goals are (and we haven't reached them all yet):
1. Any language version of the software can be installed on any OS flavour
(language and version). Caveat: some OSes make this impossible to implement
fully due to their own limitations (e.g. the Win95, Win98 code-page based
file systems and registry). An alternative implementation is that the
English version can install anywhere, but has a configurable UI/Help
language - this is the Office2000 approach, which is done to include Win9x.

2. Any language that can be generated on the OS can be
input/displayed/edited/printed/sorted by the application. It's OK to go
beyond what the OS supports (like Office and IE do with CJK on Win9x), but
this is strictly "bonus" and usually has some niggly issues due to lack of
testing of that language on an OS that doesn't natively support it.

3. Another condition we use is that we expect new releases of our software
to at least offer a version that can switch the UI/Help language per user.
Office2000 with Multilanguage Pack does this, and future releases will
expand this functionality to more applications.

4. All application functionality should be available on any OS provided the
OS supports it. For example, there is no reason to disable Korean features
on an Arabic OS unless they physically cannot be supported (like Korean true
vertical layout with rotated glyphs on Win9x Arabic). We don't include all
add-ons for all languages in every release (the number of Korean users in
Arabic speaking countries is low enough that we don't burden users who buy
the Arabic version with extra Korean add-ons). Instead we include the most
likely add-ons, and offer all the rest of the language tools in a separate
product (Office2000 Proofing Tools)

5. For things like regional date formats, we try to keep the original format
of creation. Word achieves this by retaining the format and the locale of
the creator in the date field. Other apps may not have this rich storage and
recalc of a date field will likely cause a reinterpretation of the format
according to the viewer's locale. There are also political correctness
concerns, so we do not allow creation of some types of date formats in some
regions, and recalcs under these conditions may cause date displays to
change.

I don't claim these statements are complete - this is just a sample since
the devil is in the details and you have to have designers tackle the
unexpected little details and do the research. You'll note that as an
application suite, as a matter of policy we don't try to completely support
languages beyond what the OS supports since some aspects (sorting,
keyboards) are handled by the system. However, we don't lock things out
either if possible. So just because Windows does not have for example, an
Inuktitut keyboard does not mean you can't type and use Inuktitut in Office
or other Unicode Windows applications assuming you can generate the
characters somehow (there is such a keyboard - it just doesn't come with
Windows)

Another comment I would make is that we have many, many things to work on,
and everything gets prioritized (by business opportunity if it costs
significant $$$ or development time, but also by employee interest).
Currently my personal goal is to get basic support for as many languages as
possible (input/display/edit/print). At the same time, we make improvements
in our currently supported languages according to their priority and
everything needs to be balanced out.

But regarding the topic du jour (Serbian vs. Russian glyphs in fonts), it is
more likely that we would work on getting a new language (e.g. Khmer or
Yoruba) to function in the apps before going back and adding the nicety of
language-selectable glyphs from a single font. Currently the Serbian-Russian
problem (just like the CJK problem) can be worked around by using a Serbian
font rather than a Russian one, so that lessens the urgency of that problem
relative to being unable to use Khmer at all (in a Unicode way - I know
there are hacks to support Khmer but these have big interop problems)

Chris Pratley
Group Program Manager
Microsoft Word

-----Original Message-----
From: Peck, Jon [mailto:peck@spss.com]
Sent: Thursday, January 20, 2000 5:32 AM
To: Unicode List
Subject: RE: Language Tagging And Unicode

Is there a document that explains the MS I18N model that Office is working
toward? I'm not asking about api's etc but what the definition is of
correct worldwide behavior. Not a definition of how a date should be
displayed in France, but a statement of principles that addresses such
concerns as how a date created in France should be displayed when the
document is opened in Taiwan, for example.

Of course there won't always be one best solution, but it would be nice to
know what the MS framework is for this.

Regards,
Kim Peck

-----Original Message-----
From: Chris Pratley [mailto:chrispr@microsoft.com]
Sent: Thursday, January 20, 2000 2:11 AM
To: Unicode List
Subject: RE: Language Tagging And Unicode

Peter mentions that Word uses language information only for selecting
proofing tools, but that is not all. Word uses language for many things:
Determining date format
Determining sort order
Controlling line breaking, word breaking (for scripts that need it)
Determining many default properties
Etc.

Once the necessary infrastructure (fonts, UniScribe support) is available in
at least a prototype testable form, and if I am still running things :),
you'll probably see Word start displaying language-appropriate glyphs from
fonts if they exist. From the application perspective all we need is to be
able to add a language parameter to our text rendering, and the OS should
take care of it (UniScribe is part of Windows). These things take time. I
suspect that given the standard priorities, the languages that we see this
first will be CJK. Once the infrastructure is there, third parties could
create fonts with the necessary glyph tables for languages we didn't do
initially.

Chris Pratley
Group Program Manager
Microsoft Word



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT