Re: Unicode and XML : real publications

From: Misha Wolf (misha.wolf@reuters.com)
Date: Thu Apr 30 1998 - 13:49:43 EDT

Next message: Michael Everson: "Re: Unicode and XML : real publications"
Previous message: Jim Saunders: "Re: Unicode and Java TextAreas"
Next in thread: Michael Everson: "Re: Unicode and XML : real publications"
Maybe reply: Michael Everson: "Re: Unicode and XML : real publications"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The appended mail was forwarded to me by Tim Bray. I have since obtained
Baden's permission to forward it further. Please copy him [1] on your
responses, as I don't think he's on the Unicode list.

Baden, if/when you join the Unicode list, please say so, so people can stop
copying you.

[1] Baden Hughes (bmhughes@ozemail.com.au)

Misha

> =========================
> X-Sender: bmhughes@ozemail.com.au
> Date: Mon, 27 Apr 1998 22:59:00 +1000
> To: Tim Bray <tbray@textuality.com>
> From: Baden Hughes <bmhughes@ozemail.com.au>
> Subject: Unicode and XML : real publications
>
> Hi Tim
>
> You will probably have no idea who I am, but the sig should explain some of
> it for you ...
>
> Although this may seem a little obscure, and in some ways its more an issue
> of Unicode than XML or TEI, but I have some comments and questions about
> the use of XML and Unicode and TEI for real publications in Asia. I'm not
> sure what these mean in terms of the spec per se, but it is an issue for us
> here as we would very much like to run with XML but the Unicode dependence
> is a killer for us.
>
> As far as I know, XML supports (ie. accepts) the "Private use" characters
> in Unicode.
>
> For our work this means that we could develop a corporation wide standard
> in the cases where national standard boards do not allow us to propose
> standards for minority languages or special characters.
>
> In real life, humanities computing is more frequently faced with weird
> characters than the average W3C member, even though the Intl8n group is
> active (and yes, we participate in this.) In part, the difference between
> TEI's approach to the XML approach may reflect the states of the two
> standards. In our world, we find that Unicode is not up to Egyptian
> hieroglyphics, Akkadian, variant Thai traditions. Thus heavy reliance on
> Unicode, without the support of the private use space would prevent us from
> utilising technologies like XML to their fullest potential. There have been
> times when our work has prompted change within the industry, TeX being a
> good example once we had presented some technical problems to a conference
> in the late 80s. We are kind of hoping that something might happen in
> regard to XML in a similar fashion...
>
> Let me give you some examples of real world problems with typography and
> encoding from our Asian work ...
>
> - In the case of the Devangari, Unicode includes the basic character set
> but it is the
> conjugative letters that give us the problems. In the Bible we use more
> conjugate (ligature) characters than normally found in the languages that
> use this script as we require them to handle the phonetics of the Biblical
> names which borrow sounds from Hebrew and Greek.
>
> - In many countries hilltribe and minority languages are not represented by
> the national unicode committees that are appointed by the government.
>
> - There are special needs for phonetics in the Bible names that often
> require new conjugatives that are not commonly used in other forms of
> indigeous literature.
>
> - There are often parallel traditions in the use of ligatures that need to
> be handled. In Thai for example government literature excludes the stroke
> under the YAW POO YING when adding a OO vowel while the Buddhist typesetting
> does not: Yet in Unicode there is only a single Yaw Poo Ying.
>
> Here are the glyphs (best viewed using a fixed width font):
>
> XXXX X XXXX X
> X X X X X X
> X X X X X X
> X X X X X X
> X X X X X X
> XX XXX XX XXX
> X
> XXXXXX XX X
> X X
> XX X XXX
> X X
> XXX
>
> - In minority languages of Thailand there are character and ligature
> combinations that are prohibited by the national language use of the
> script. Unicode implementations prohibit the typing of these combinations.
>
> - In multiscript projects we have required special encoding for noting
> special features like captialization, and punctuation in target scripts
> that are not present in the source script. Here is an example from the Mien
> NT (Jhn 11:35):
>
> ye-su zyrug Eyemq.
>
> Yesu ziouc nyiemv.
>
> (again, best viewed with a fixed width font):
> X
> X XX X XX XX X X XX XX XX X X XX
> X X X X X X X X X X X X X X
> X XX X XXXX X X X X X X X
> X X X X X X XXX XXX X X
> XX XXX XXXX X X XX XX X
> X X
> XX X X X
> XX X XXXX
> XX
> X
> XXXXXXXX
>
> X XXXX X XX X XX X
> X X X X X X XX X
> X X X X XX X X X
> X X X X X X XXX X
> XX XX XXX XXXX XX XXX
>
> XXXXXX
>
> - In Urdu we have the interesting feature that the language uses Persian
> script as italics and Nastaliq script as normal. However, Unicode does not
> support the single encoding for both scripts.
>
> I hope these problems give you feel for the issues we face. I hope that it
> will help you illustrate the challenges and problems we have in using
> Unicode for real publications in Asia. Thus the support of private use
> space is critical to us.
>
> There are times when we have driven change on issues such as these. In
> terms of our input, we have well qualified technical staff who are prepared
> to put in large amounts of effort to working for a solution, not only for
> our language projects, but for the rare or minority non-Roman script
> languages.
>
> I would be happy to pursue any contact you make that might give us some
> help with the problems we have. I look forward to hearing from you.
>
> Regards
>
> Baden Hughes
> Regional Computer Assisted Publishing Program - Asia Pacific
> + IT Services Technical Strategist
> United Bible Societies
> Email: bmhughes@ozemail.com.au

------------------------------------------------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of
Reuters Ltd.

Next message: Michael Everson: "Re: Unicode and XML : real publications"
Previous message: Jim Saunders: "Re: Unicode and Java TextAreas"
Next in thread: Michael Everson: "Re: Unicode and XML : real publications"
Maybe reply: Michael Everson: "Re: Unicode and XML : real publications"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT