Re: Multilingual Documents [was: HTML forms and UTF-8]

From: A. Vine (avine@eng.sun.com)
Date: Thu Dec 02 1999 - 15:13:35 EST


Paul Keinanen wrote:
>
> On Wed, 1 Dec 1999 18:18:27 -0800 (PST), "A. Vine" <avine@eng.sun.com>
> wrote:
>
> >I'm not saying we're not moving in that direction - we certainly are here at Sun
> >and the Alliance, and I know other companies are. The question is, how fast?
> >Unicode is a big help in the area of multilingual support, but it is only a
> >piece of it.
>
> What exactly do you mean by multilingual support and how exactly does
> this differ from generating separate documents in separate languages
> with the same program (assuming internal Unicode or UCS-4 data
> representation) ?

The same program can generate separate documents in separate languages, but not
generate a single document in many languages (scripts). This is what I refer to
as monolingual i18n.

>
> The only problems with true multilingual documents I can think of in
> text processing, is to have some internal markup about which part of
> the text is in a particular language, so that the correct hyphenation
> rules and spell checker can be applied for each fragment of text. This
> requires some manual input or some unreliable heuristics, but apart
> from this, what is so special about this ?

True multilingual documents have all sorts of rendering issues. In addition, if
you're supplying services such as spell-check, grammar check, automatic
formatting, and so on, this adds another level of complexity. For monolingual
i18n, the isolation of one language per document eases this burden, even if the
data are in Unicode.

>
> The days have long since gone when it was acceptable to have
> completely separate tools for each language, with the heavy costs of
> training people for the oddities of each individual tool. At least in
> companies that are doing foreign trade (which is much more common in
> small countries than in large countries with a huge domestic market
> such as the USA), the need for processing documents in various
> languages is quite common, so the tools are selected largely on the
> bases what languages they support.

The tools may not be separate, but there are elements specific to each language
- rendering, word wrap, hyphenation, etc. These elements still have to be
designed, written, tested, and packaged.

>
> I do not think that the step from supporting documents in multiple
> languages to true multilingual documents is that large, as some seem
> to imply in this forum.

I do. I know the work which is involved. I have to estimate resources for just
that sort of change. It can be done, but it costs money.

>
>
> >My observation is that there are not enough folks willing to pay the price
> >_just_ for multilingual to warrant an all-out effort towards implementing it
> >faster. I think many people would like the capability, and would use it if they
> >had it, but when you tell them what it would cost and what other new features
> >they'll have to give up if they want it right away, they would gladly stick with
> >monolingual i18n.
>
> What is monolingual i18n ? English with 8 bit characters ?

As explained before, full data processing, one language at a time.

Andrea



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT