Re: Unicode and XML : real publications

From: Misha Wolf (misha.wolf@reuters.com)
Date: Thu Apr 30 1998 - 13:49:43 EDT


The appended mail was forwarded to me by Tim Bray. I have since obtained
Baden's permission to forward it further. Please copy him [1] on your
responses, as I don't think he's on the Unicode list.

Baden, if/when you join the Unicode list, please say so, so people can stop
copying you.

[1] Baden Hughes (bmhughes@ozemail.com.au)

Misha

> =========================
> X-Sender: bmhughes@ozemail.com.au
> Date: Mon, 27 Apr 1998 22:59:00 +1000
> To: Tim Bray <tbray@textuality.com>
> From: Baden Hughes <bmhughes@ozemail.com.au>
> Subject: Unicode and XML : real publications
>
> Hi Tim
>
> You will probably have no idea who I am, but the sig should explain some of
> it for you ...
>
> Although this may seem a little obscure, and in some ways its more an issue
> of Unicode than XML or TEI, but I have some comments and questions about
> the use of XML and Unicode and TEI for real publications in Asia. I'm not
> sure what these mean in terms of the spec per se, but it is an issue for us
> here as we would very much like to run with XML but the Unicode dependence
> is a killer for us.
>
> As far as I know, XML supports (ie. accepts) the "Private use" characters
> in Unicode.
>
> For our work this means that we could develop a corporation wide standard
> in the cases where national standard boards do not allow us to propose
> standards for minority languages or special characters.
>
> In real life, humanities computing is more frequently faced with weird
> characters than the average W3C member, even though the Intl8n group is
> active (and yes, we participate in this.) In part, the difference between
> TEI's approach to the XML approach may reflect the states of the two
> standards. In our world, we find that Unicode is not up to Egyptian
> hieroglyphics, Akkadian, variant Thai traditions. Thus heavy reliance on
> Unicode, without the support of the private use space would prevent us from
> utilising technologies like XML to their fullest potential. There have been
> times when our work has prompted change within the industry, TeX being a
> good example once we had presented some technical problems to a conference
> in the late 80s. We are kind of hoping that something might happen in
> regard to XML in a similar fashion...
>
> Let me give you some examples of real world problems with typography and
> encoding from our Asian work ...
>
> - In the case of the Devangari, Unicode includes the basic character set
> but it is the
> conjugative letters that give us the problems. In the Bible we use more
> conjugate (ligature) characters than normally found in the languages that
> use this script as we require them to handle the phonetics of the Biblical
> names which borrow sounds from Hebrew and Greek.
>
> - In many countries hilltribe and minority languages are not represented by
> the national unicode committees that are appointed by the government.
>
> - There are special needs for phonetics in the Bible names that often
> require new conjugatives that are not commonly used in other forms of
> indigeous literature.
>
> - There are often parallel traditions in the use of ligatures that need to
> be handled. In Thai for example government literature excludes the stroke
> under the YAW POO YING when adding a OO vowel while the Buddhist typesetting
> does not: Yet in Unicode there is only a single Yaw Poo Ying.
>
> Here are the glyphs (best viewed using a fixed width font):
>
> XXXX X XXXX X
> X X X X X X
> X X X X X X
> X X X X X X
> X X X X X X
> XX XXX XX XXX
> X
> XXXXXX XX X
> X X
> XX X XXX
> X X
> XXX
>
> - In minority languages of Thailand there are character and ligature
> combinations that are prohibited by the national language use of the
> script. Unicode implementations prohibit the typing of these combinations.
>
> - In multiscript projects we have required special encoding for noting
> special features like captialization, and punctuation in target scripts
> that are not present in the source script. Here is an example from the Mien
> NT (Jhn 11:35):
>
> ye-su zyrug Eyemq.
>
> Yesu ziouc nyiemv.
>
> (again, best viewed with a fixed width font):
> X
> X XX X XX XX X X XX XX XX X X XX
> X X X X X X X X X X X X X X
> X XX X XXXX X X X X X X X
> X X X X X X XXX XXX X X
> XX XXX XXXX X X XX XX X
> X X
> XX X X X
> XX X XXXX
> XX
> X
> XXXXXXXX
>
> X XXXX X XX X XX X
> X X X X X X XX X
> X X X X XX X X X
> X X X X X X XXX X
> XX XX XXX XXXX XX XXX
>
> XXXXXX
>
> - In Urdu we have the interesting feature that the language uses Persian
> script as italics and Nastaliq script as normal. However, Unicode does not
> support the single encoding for both scripts.
>
> I hope these problems give you feel for the issues we face. I hope that it
> will help you illustrate the challenges and problems we have in using
> Unicode for real publications in Asia. Thus the support of private use
> space is critical to us.
>
> There are times when we have driven change on issues such as these. In
> terms of our input, we have well qualified technical staff who are prepared
> to put in large amounts of effort to working for a solution, not only for
> our language projects, but for the rare or minority non-Roman script
> languages.
>
> I would be happy to pursue any contact you make that might give us some
> help with the problems we have. I look forward to hearing from you.
>
> Regards
>
> Baden Hughes
> Regional Computer Assisted Publishing Program - Asia Pacific
> + IT Services Technical Strategist
> United Bible Societies
> Email: bmhughes@ozemail.com.au

------------------------------------------------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of
Reuters Ltd.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT