Re: Whence UniData.txt? (was Re: unidata is big)

From: Theo Veenker (
Date: Wed Apr 24 2002 - 13:14:14 EDT wrote:
> Theo's comment leads me to a question I've pondered recently:
> Assumptions:
> Many apps, from independent sources, need to access the Unicode
> character data,
> A lot of these apps aren't overly concerned with the slight overhead of
> parsing the data as needed from Unicode-supplied data files directly.
> Similarly, such apps benefit from being able to easily upgrade to new
> Unicode releases by simply replacing the data files.
> It isn't very user-friendly to for every such app to store their own
> private copy of the character data files when a single shared copy would
> take up less space and be easier to maintain.
> It would seem to me that there is some value in establishing either (1) a
> standard location where programs can expect to find (or install) a local
> copy of the Unicode data files, or (2) a standard way to discover where
> such a local copy of these files exist. My preference would be (2), which
> would make it easy to configure a network of machines to share a single
> copy of the data files. Something as simple as an environment variable
> could work if developers were to agree on its name and semantics.

For applications that eat raw UCD files, this shouldn't be to difficult
to achieve. Any well designed app will/should have some parameter or env.
variable that you can set (no?). But for apps/libraries that like their UCD
files cooked it is a different story because there is no recommended binary
format for representing (compact) unicode character data. Personally I
would appreciate seeing such a recommendation including your point (2).
However apps/libs which enrich the character data with custom properties,
would still need their own copy of the data.

The subject reminds me of the TZ database. Here you have a large text based
database containing information on time zones and daylight saving times.
You can compile the data into a binary format by running a utility included
with the tz sources. Well, they don't give any recommendation on where to
store the (text and/or binary) data, but at least there is a 'standard'
format, which allows for sharing data. Would be nice to have something like
this for the UCD.

> (I understand there may be different mechanisms for different platforms,
> but it would be even better if a standard mechanism were cross platform).
> So, are there any conventions for this evolving? Or would anyone like to
> propose one?

Please, go ahead :o)


This archive was generated by hypermail 2.1.2 : Wed Apr 24 2002 - 14:05:51 EDT