Re: phonetic superscripts, etc. (was Re: Superscript asterisk)

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Fri Jul 02 1999 - 16:49:57 EDT


At 10:40 -0700 7/2/1999, Peter_Constable@sil.org wrote:
>>>There are superscript International Phonetic Alphabet characters which were
>>not included to support any particular character sets so far as I know, but
>>phonetic entities like aspiration (h). We (Finland, Norway, Ireland) are
>>preparing a proposal for Finno-Ugric Phonetic Alphabet support which
>>>contains
>rather a lot of superscript, subscript, and small-capital letters, >and whose
>semantics are completely different from the plain letters, and >must be
>distinguished from them in plain text (for lexicographical
>>searching, etc.).
>
>>Why in plain text? This is an obvious application for developing an XML
>>tagging
>scheme or some other form of markup.
>
>Representing an organisation that makes heavy use of phonetic transcriptions,
>and being in the position of supporting hundreds of linguists that work with
>this stuff, I can assure you that the last thing they want is to have their
>phonetic/phonemic data be XML-tagged. Just as you wouldn't want this
>sentence to
>be encoded as follows:
>
><uc>j</uc>ust as you probably <contr>would not</contr> want this sentence
>to be
>encoded as follows:

We have to distinguish between the IME for your script, the file format,
data structures, and presentation formats for screen or printer. (Unicode
clearly separates the concept of (internal) character and (output) glyph,
but is not so clear on input encodings.) Many devices that would be
abhorrent for input or output are perfectly satisfactory for data storage
and transfer.

>You can do any process on the latter that you could on the original, but the
>algorithm needs to be modified. If you're doing it once, that's fine. But
>if you
>frequently make up new processes, you'd probably rather just have the
>plain text
>rather than have to parse XML in addition to dealing with the plain text. And
>this little example is tame in comparison with what would be involved with
>phonetic representations.
>
>
>Peter

Well, I was thinking of a different context, that of including IPA within a
document along with other writing systems. There is nothing preventing you
from defining your own character usages, including treating a single
Unicode character in different formats as different characters within your
domain. We mathematicians do it all the time. One fairly common example is
using plain and bold letters for different but related objects such as
vectors and tensors. Again I say, (IME ~= file) ^ (file ~= output), and not
only that, but character encoding is not character semantics.

Anyway, one of the proposed advantages of an XML scheme is that it can be
made as general as the subject matter allows and requires. You could set it
up to handle all of the variations of superscripts, small caps, and much
more, once. Then you could create hundreds, nay, thousands and myriads of
new combinations without further ado, and without having to come back to
the Unicode and ISO comittees each time for the protracted process of
registration.

--
Edward Cherlin                        President
Coalition Against Unsolicited Commercial E-mail
Help outlaw Spam.       <http://www.cauce.org/>
Talk to us at             <news:comp.org.cauce>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT