Script v. Language (was: RE: Arabic - Alef Maqsurah )

From: Reynolds, Gregg (greynolds@datalogics.com)
Date: Fri Jul 16 1999 - 13:03:27 EDT


Hello Peter,

Thanks much for your response. I'm not entirely sure we're talking about
the same thing, so here are a few quick notes, with more detail to follow if
the Gods of Weekend Freetime are generous.

> -----Original Message-----
> From: Peter_Constable@sil.org [mailto:Peter_Constable@sil.org]
> Sent: Thursday, July 15, 1999 4:16 PM
> To: Unicode List
> Subject: Re: Arabic - Alef Maqsurah
>
>
>
>
> >But this begs the question. They don't encounter particular
> morphologies; they
> encounter particular encodings. Encodings, natural and
> artificial, always
> reflect some theory of language. Change the encoding and you
> change the
> problem.
>
> ..
>
> >On the contrary, you cannot *not* build morphological
> structure into an
> encoding. Unicode already does: lexemes are built by
> concatenating text atoms.
> Works great for English, not so great for e.g. Arabic. How
> else can one explain
> the space "character" as a positive element? Even for Arabic Unicode
> accomodates some level of morphological intelligence:
> "contextual shaping"
> encodes morphology (prosodic word boundary). Every "natural"
> encoding of
> language into visual form does the same to some extent. It's
> not a question of
> whether, but of how much.
>
> I think this fails to recognise, in the general case, the
> distinctness between
> writing as a representation of language, and language itself.
> While they are
> related (Richard Sproat is working on a book in which he
> makes specific claims
> about the relationship between a writing system for a
> language and the phonolgy
> of the language), they are clearly different, and the writing
> sytem can have
> behavious that are completely independent of the language
> which is represented.

"writing", "representation of a language", "language itself": these terms
are all packed with a huge amount of information/baggage. What does it
mean, after all, to say that writing is a representation of language? And
how can a writing "system" be "completely independent of" the (written)
language that employs it?

I would argue against what we might call the naive view (which is not
necessarily what you have in mind) of writing qua representation; that is,
writing does not simply record/transcribe/reflect/etc language. It
constructs it. It represents a theory of language, and by learning to write
we learn how to think about language. So Derrida was right: writing
precedes language. (Boy, do I love dropping that name. Never mind that I
have only the vaguest idea of what his writings mean; I just like the
phrase.) Actually, I lied: I wouldn't make the argument, but I would steal
it from others. In particular, "Spaces Before Words", by Paul Saenger, "The
World on Paper" by David Olson, and "Visible Speech: The Diverse Oneness of
Writing Systems" by John DeFrancis. The early chapters of Saenger in
particular provide a terrific summary of the physiology of reading and its
relation to letterforms. (It's a history of the development of silent
reading in the West). You can find my bibtex files for these and some
related stuff at http://www.enteract.com/~greyno/bib.

In a word: scripts cannot possibly be independent of written language and
vice-versa. (Unless by script one means a purely graphical notion; but
unicode explicitly rejects this view).

> Some examples:
>
> - line direction is a purely visual phenomenon with no
> connection to language

Hmm. I think I would quibble, on grounds that linear (visual)
directionality is related to temporal linear directionality. I think of
writing (metaphorically, not definitionally) as the projection of stuff in a
temporal space into a two-dimensional visual space. But of course it's true
that which direction a system uses is probably arbitrary.

> - non-linearity (e.g. Indic scripts, Pahawh Hmong - these
> demonstrate that it's
> possible for writing to reflect more that phonemes, but they
> also demonstrate an
> independence between the spatial sequence of characters and
> the temporal
> sequence of the phonemes represented)

Hmm. I think I would quibble. I don't think these systems are non-linear.
They happen to employ a few exceptional devices, but I'm not so sure this
makes them any less linear in their strategies. Must ponder.

> - Arabic contextual forms: while the typical behaviour is
> that the contextual
> shaping reflects prosodic word boundaries, the fact that this
> is not done
> consistently clearly indicates that the script is independent
> of the language

Hmm. Presumably you mean that word endings do not always employ final-form
shapes, as in the nun in kAna. But its entirely consistent the other way
around: final-form shapes always delimit prosodic words. True, some
characters, such as dal and waw, employ the same form for final and medial
(note that my usage reflects grammatical analysis, not visual analysis), but
the usage is quite consistent. Either way, I'm not sure what this says
about script being "independent" of language. If by "language" you mean the
primordial unanalyzed yawp, then I would agree; but if you mean that, e.g.,
the Arabic script is independent of the written language Arabic, I would be
pretty skeptical.

> - "word" spacing in Kayah Li: if I understand correctly, the
> spaces between
> written words *do not* corresponding with morphological or
> phonological words
>

No doubt true, but should they? Prosodic units may correspond to various
units at different levels of linguistic analysis - morpheme, lexical word,
phrase, breath group, etc.

> In the history of computing, text encodings have always been
> encodings of
> writing.

Of course this formulation begs the question. No fair citing as fact the
very thing that is under discussion! Well, at least it's debatable from my
POV.

[snip]
>
> It should be pointed out that encoding based upon writing is
> not identical with
> writing based upon visual form, in the sense of presentation
> forms. Encoding can
> be based upon presentation forms, but they need not be: they
> can be based upon
> an abstract view of the writing in which there is a notion of
> "character" which
> is distinct from presentation forms, which is precisely the
> approach of Unicode.
> Forgive me for stating the obvious, but I needed to be sure
> this point wasn't
> overlooked.

I dunno, maybe I missed something, but the harder I look at the way Unicode
talks about character, presentation form, abstract semantics, etc., the less
obvious it gets. An abstract view of character, with no presentational
content, and also with no grammatical or metalinguistic content - where does
that leave us? Clearly I need to put together specific examples; will try
this weekend.

....

My previous notes were probably somewhat misleading with respect to
morphological encoding. I don't think I would argue that Unicode should
encode morphology, or at least I don't think I would state it in that way.
On the other hand, if we think of plaintext as essentially flat, there still
are opportunities for encoding some abstract grammatical information that
may be used to encode non-flat structures of information. It's a question
of finding the intersection of flat concatenative structures and other
levels of organization. Or something like that - can you tell I'm still
struggling with this a bit?

Bye for now,

Gregg



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT