Re: phonetic superscripts, etc.

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Sun Jul 04 1999 - 04:36:34 EDT


At 11:01 -0700 7/3/1999, Peter_Constable@sil.org wrote:
[Ed Cherlin wrote:}
>>Well, I was thinking of a different context, that of including IPA within a
>document along with other writing systems. There is nothing preventing you
>from
>defining your own character usages, including treating a single Unicode
>character in different formats as different characters within your domain. We
>mathematicians do it all the time. One fairly common example is using
>plain and
>bold letters for different but related objects such as vectors and tensors.
>Again I say, (IME ~= file) ^ (file ~= output), and not only that, but
>character
>encoding is not character semantics.
>
>I've been preaching (IM ~= file) ^ (file ~= output) for some time, so I'm all
>with you there. If all that's involved here is entry and display, then
>it's not
>hard for me to build a font with as many presentation forms as I need, PUA
>allocations as needed, and an appropriate IM. I can even apply multiple
>fonts or
>other formatting if needed. The point is, that's not all that is needed.
>In the
>following model of text processing:
>
>
> ------- -----------
> | INPUT | | RENDERING |
> ------- -----------
> \ /
> ----------
> | ENCODING |
> ----------
> / \
> ------------ ----------
> | CONVERSION | | ANALYSIS |
> ------------ ----------
>
>you've mentioned the top half. The part that concerns me most is analysis.
>
>
>>Anyway, one of the proposed advantages of an XML scheme is that it can be
>>made
>as general as the subject matter allows and requires. You could set it up to
>handle all of the variations of superscripts, small caps, and much more, once.
>Then you could create hundreds, nay, thousands and myriads of new combinations
>without further ado, and without having to come back to the Unicode and ISO
>comittees each time for the protracted process of registration.
>
>That's true. However, it's not just what analysis I want to do on the text
>that
>concerns me. It's the hundreds of other linguists I work with that aren't as
>computer savvy. They want to be able to do all kinds of things on their
>IPA text
>in a way in which semantics, not entry or appearance, is the whole point.
>They're used to tools that work on character string. Teaching them to
>parse XML
>is forcing them to bend to fit limited technology rather than to develop
>technology to meet their needs.

Oh, well, I certainly didn't mean that all users should become computer
geeks just so we can avoid multiplying character entities further. The idea
in defining an XML extension is that the XML is not just a tagging scheme,
but includes a standardized, machine-independent implementation of its
definitions, executable by any conforming XML browser or other interpreter.
You still need a few users to be geeks, but not many.

My question can be put another way. Should IPA characters be mapped
one-to-one into Unicode characters, or should IPA be considered a different
data type? Is it just a writing system like all other writing systems or
should the process which goes from a sequence of IPA entities to a graphic
image (on screen, paper, or whatever) include a translation from IPA to
formatted Unicode characters, thus decoupling the problem of defining IPA
entities from the problem of defining the mapping from formatted character
strings to lists of glyph/position pairs for rendering them.

>For at least the same reasons that there is interest in extending Unicode to
>meet the needs of mathematicians, linguists would benefit from extending
>Unicode
>to meet the needs of phonetic/phonemic transcription. After all, transcribed
>language is a form of writing, a form of text, and the whole point of
>Unicode is
>to provide a single standard for encoding of text.
>
>Peter

As a mathematician myself, and also a computer geek, I don't hold with
multiplying characters in mathematics in the vain hope of keeping up with
mathematical entities. I don't know enough about MathML to comment on it,
and I haven't had a look at the new math character proposal recently added
to the Roadmap, but I would apply my own principles to my own domain.
Semantics cannot be mapped cleanly to character formatting, so that
character definitions bear the burden of supporting all the distinctions
between mathematical types. In the most general domains, such as category
theory and model theory, mathematics deals with infinite varieties of
structure, literally more than can be numbered even with the infinity of
transfinite cardinals (which also is too big to be numbered). In the finite
realm of math symbols, many are necessarily overloaded with multiple
meanings in different domains, so extensible algorithmic markup of some
kind is needed to make the necessary set of distinctions.

Edward Cherlin

"Well, you may be right, and certainly I cannot go so far
as to say that you are wrong, but still, at the same time..."
James Branch Cabell
Jurgen, A Comedy of Justice



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT