From: Gregg Reynolds (unicode@arabink.com)
Date: Sat Jul 09 2005 - 09:46:16 CDT
Kenneth Whistler wrote:
> Gregg said:
>
This gets back
>>to the design principles (and the interests that drive them) of Unicode,
>>which work better for some languages than others.
>
> for some... writing systems than others.
Ok, now this is an interesting point. I should have said "written
languages"; you contrasted "writing systems". What's the diff?
When I first got into this stuff in earnest, in the late 90s, right
about the time I earned my "Ranting Unicode Newbie" wings, I ended up
with the notion that written languages are, well, languages in graphical
form. I recall having found solid theoretical support for this, but
naturally the argument escapes me at the moment.
In any case, I like a "written language" because (to me at least) it
implies that letterforms are part of a larger linguistic reality,
related to the various levels of standard grammar, i.e. phonology,
syntax. That's why I say Unicode works better for some written
languages than others. To put it another way, different (spoken )
language types have different semantic relations to their (written)
graphical forms. From there we can go to different encoding
philosophies. Unicode is one, based on a certain way of looking at
things. Looking from a "written languages" perspective, one comes up
with a different set of design principles. (Can you tell I'm struggling
a bit to articulate just what the heck I have in mind?)
In contrast, "writing system" captures a different set of ideas. It's
the right term to use for Unicode, insofar as it does *not* imply
anything about grammar, semantics, etc. To me Unicode looks like a kind
of surface orthography, concerned only about the (abstract structure of)
marks on the paper, not the linguistic structures represented by those
marks. Hence "writing systems".
In fact one could argue that letterforms are of secondary importance to
written languages, just as digit forms are of secondary importants to
mathematics. One could create a written English text using Arabic (or
Korean etc) letterforms if one were so inclined (by which I *don't* mean
transliteration); it would still be written English. Actually, the
recent posting regarding the relation of Tamil grammar to letterforms
captures the idea nicely.
Which brings up the question of "what is a character, *really*?"
Maybe "written languages use writing systems" is what I mean.
Does that make any sense? (If any of you real linguists can pull the
worms out of my head on this point please do. They're nice worms. If
not, I'd appreciate help articulating this stuff cleanly. I think it
might be helpful to newcomers.)
>
> Furthermore, as much as it would be nice to have Arabic simply
> be implemented consistently right-to-left, in any *practical*
> implementation, you *must* deal with bidirectionality.
Hmmm. I guess it depends on what you mean by "practical", and for whom.
I can easily imagine monodirectional implementations being very
practical for certain user groups. In fact, I use such software all the
time: vim (http://www.vim.org/), which has non-bidi Arabic support, and
(believe it or not) emacs, which has very strange but usable
quasi-monodirectional Arabic support (lines flow left to right, words
right to left. CRAZY.). I also use emacs to write Arabic with latin-1
characters. Written Arabic *language*, non-Arabic writing system? More
on this later...
>
> I realize that you think you may have a better mousetrap in
> approaching the problem of encoding Arabic text than the
> encoding used in the Unicode Standard --- but...
Well, better for some things, maybe; but also based on different design
principles. I expect to post a webpage soon with a whole bunch of CRAZY
ideas for encoding (written) Arabic. No fewer than 19 - count 'em,
nineteen! - key design decisions! Some of them may even fit into the
Unicode model. (Much of the fun of speculative Arabic encoding design
derives from the fact that traditional Arabic grammar finds lots of
semantics in individual characters. In fact, I'll bet a strong argument
could be made that every Arabic letterform in a text has either
graphotactic (empty) semantics or morphemic semantics.)
> However you cut the pie, you are still faced with the
> difficulties that the script presents you in dealing with
> the basic information processing requirements: keyboard
> input, text storage, searching, sorting, editing, layout
> and rendering, and so on. The whole stack of information
> processing has to function -- and has to function in the
> context of existing software systems, data storage technologies,
> databases, fonts, libraries, internet protocols, and on and
> on ... or you haven't got any solution at all. Just ideas
> and a theory.
Running code always wins. I've actually given this quite a bit of
thought. (You'd be amazed what you can do with just emacs in the way of
encoding experimentation.) Unfortunately I'm a rather lazy hacker so
I'll have to depend on the kindness of strangers.
-gregg
This archive was generated by hypermail 2.1.5 : Sat Jul 09 2005 - 09:48:30 CDT