Re: UC data

From: Edward Cherlin (
Date: Thu Jul 25 1996 - 19:42:35 EDT

>It was suggested I send the following suggestion to you:
>Another useful addition to the Web site would be pointers to some
>significant texts (in the public domain of course) in various
>languages encoded in Unicode. A bit like project Gutenburg, on a
>smaller scale, but with many different languages. UTF-8 encoded I

I would personally like to see The Book of a Thousand Tongues on line in
Unicode. It consists of over 1,000 translations of a short passage from the
Christian scriptures, extracted from the numerous Bible translations
published by the Bible Society (London). I believe that this is still the
world record for translations of a single text. Unfortunately, the book has
been long out of print. This is a document of inherent interest to a lot of
people, and would be one of the best possible demonstrations of Unicode for
the general public. Non-Christians might object (I am and I don't), but we
can put other samples on-line. For example, I remember hearing of an
8-language Buddhist dictionary. Even one page from that would be a good
demonstration. We should have no trouble finding secular documents that
have been put into 50 languages--perhaps the UN charter, or some materials
from the current Olympics (140 languages supported in the Olympic village,
I hear).

>I have yet to stumble on a Unicode text that goes beyond a 5-liner
>in English with a smiley in it. Not very exciting. It seems there
>is still far more written about Unicode that in Unicode.

There are Unicode-capable word processors, DBMSs, E-mail programs, and Web
browsers capable of importing data in a variety of character sets and
saving it in Unicode text or document files. It would be easy to convert a
large quantity of text and post it. This is not being done, presumably
because few people have the tools to read it, and they can just as well do
it themselves if they do. I suspect that a considerable amount of data
exists in Unicode, but certainly little of it is publicly available.

