My original requests for information have been answered in many ways. In
some respects I'm a lot wiser. In others I'm not at all.
There was some discussion about whether end-users need to be aware of
Unicode and how to use it. I'm afraid this is a regular refrain especially
amongst those based in the US where the problematics of different
code-pages may seem remote. If one is using a word-processing package, for
example, then I would agree that the issues are less pressing. Since using
Windows, I have been able to create documents in any package using any set
of latin characters simply by buying the appropriate font, installing it,
changing to it within the package and using it where appropriate. The
designers of these word-processing packages seem to have got their acts
together so that generally a document produced in one package can be read
by another without losing quality in terms of diacritics.
With database systems this is not the case (at least not under Windows. I
am ignorant about Unix I'm afraid ...). I would argue that in these cases,
users need to be as aware of Unicode as they now need to be of ASCII codes
to allow the entry of diacritics.
Let me give a realistic example. I am building a database for central
Europe, including the neighbouring countries of Austria, Hungary and
Slovenia. I am told that Oracle, for example, has multi-lingual unicode
support. But for every large database built in an expensive, high-end and
skill intensive product like Oracle, there are thousands built in lower-end
products such as MS-Access, MS-Foxpro and Lotus Approach. Because each of
the languages involved in this example (German and Italian, Slovenian,
Slovakian and Hungarian) require a different code page to reproduce
diacritics, this data cannot be stored in a single database, or a set of
alternative characters needs to be used to "code" the diacritics in each
language, or the diacritics need to be ignored (highly undesirable). I am
building this database in the most advanced (to my knowledge) Microsoft
PC-based database system - Visual Foxpro 5.0. This package claims to
support Unicode. I can translate existing data to Unicode and read Unicode
data, but this does not help me to enter data in Unicode in the first
place. I am using Windows NT 4.0, which I am told allows the entry of
Unicode (though I still haven't been able to find out HOW this is done) -
the NT help file is very coy about Unicode, as are the VFP help files.
Even assuming I was able to add diacritics for each country, then I would
remain unable to move this data out of the files. To be read by other
packages, this data often needs to move via a third form - for example
ASCII delimited or dBase III+. This being the case, by definition, the
Unicode support is lost on translation.
It seems to me that Unicode implementation is being regarded as something
very academic and really not of practical interest to the users. I would
dispute this. The users have the practical need to overcome the problems
with which our multi-lingual world presents us. From everything that I have
read, I have the impression that I would have to wait ten more years before
a simple task like building a low-cost address database for multiple
language areas can be achieved cheaply and easily by an end-user without
losing or damaging data.
Again, I'd love to be corrected about this ...
Author "Building and Maintaining a European Direct Marketing Database"
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT