RE: Code Pages in Western Europe

From: Addison Phillips (AddisonP@simultrans.com)
Date: Thu Sep 09 1999 - 11:48:38 EDT


Hmm.. our mailer seems to have munged the end of my last post, which read:

"[...] of your direct competitors, and all of them are struggling to
rearchitect their products because they pursued a similar course to what you
describe."

Thanks,

Addison
        __________________________________________

        Addison Phillips
        Director, Globalization Consulting
        SimulTrans, L.L.C.

        AddisonP@simultrans.com (Internet email)
        http://www.simultrans.com (website)

        "22 languages. One release date."
        __________________________________________

-----Original Message-----
From: Addison Phillips [mailto:AddisonP@simultrans.com]
Sent: Thursday, September 09, 1999 7:47 AM
To: Unicode List
Subject: RE: Code Pages in Western Europe

I have a number of comments on your plans:

First off, if you're dead set on code pages, I would abandon old-fashioned
DOS code pages and use the ISO-8859-1 character set (aka Latin-1). Microsoft
calls its variation of this "code page 1252". This is the default Western
European code page in all versions of Windows and supports Portuguese, as
well as most of the common Western European languages.

But why have code pages at all?

While small scale computing environments don't have quite the
globe-encompassing need for multilocale support that one of those mondo
Enterprise web systems does, you still have the potential need to support
multiple locales at the same time. For example, not every Canadian speaks
English (but not everyone in Quebec prefers French). Not every Belgian
speaks French. You may wish to sell a few in Switzerland. And so on.

Plus, if you harness yourself to *just* Latin character sets you are locking
yourself out of the huge markets in Asia, Central Europe, Eastern Europe,
and so on.

If you architect your product with a single code page now then you'll be
faced with a huge internationalization project at a later date.

A better solution is to use Unicode now. Implementing a UTF-16 or UTF-8
based solution now will solve your "code page" problems once and for all
time. To add support for additional character sets you just add a
translation table.

Some good reasons for doing this anyway:

1. MacRoman character set != code page 850 != code page 437 != code page
1252. If you're multiprotocol and you plan to support AppleTalk then you
have at least four character sets to support...
2. Using Unicode will allow you to support the client machine's preferred
code page at runtime. Even if you're unable to display the character data,
you can still store and manipulate it without destroying it.
3. There are languages in Western Europe that do not fit into Latin-1.
Unicode supports all of these.

Another problem that I foresee for you is locale support and support for
localization of your product. These problems are not as simple as they seem.

Since you are a start up and only at the architecture stage, this is your
big chance to get it right. SimulTrans works with a number of your direct
competiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X!
 XX!
!
!
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXX
istrator. We will not support
multiple code page environments. All clients must save files in the same
code page format that has been set on the server.

Are there any issues with this code page implementation?

Issue #2:

In other Western European languages, Portuguese for example, which requires
code pages 850 and 860, will we be able to support this language with Code
Page 850 only? Will lack of support for 860, present any problems?

Any comments or feedback would be greatly appreciated.

Thank you.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT