Why Arabic shaping?

From: David Starner (dstarner98@aasaa.ofe.org)
Date: Sat Aug 11 2001 - 02:15:34 EDT


I'm moonlighting as a i18n know-it-all on an Arabization mailing list, and
this issue came up; I was wondering if someone here could help.

From: "Nadim Shaikli" <shaikli@yahoo.com>
>On Thu, 9 Aug 2001 07:44:14 +0100
> David Starner wrote:
>> Arabic Presentation Form A and B shouldn't be used in files; use
characters
>> in the 0600-06FF block and the application should take the responsibility
>> for using glyphs from Presentation Forms A & B if neccesary.

>Well, it _always_ will be necessary and that's my point (its not even
almost
>always, its "always" :-). 0600-06FF presents a flavor of the entire Arabic
>alphabet (each letter is represented in _a_ particular form - initial,
medial,
>final and isolated), it also includes all the Arabic numbers and
punctuation,
>but 0600-06FF, by all means, is not complete since it doesn't include all
the
>various character permutations (forms). With that said, let me rephrase
what
>you've noted (sorry, if I'm being dense); the idea here is to use 0600-06FF
>and simply plop characters down (irrespective of form) upon which time the
>application (or underlying library) would go about transforming the
>characters into their appropriate visual glyph (based on location, etc),
>right ?
>
>OK, here are a couple more questions :-)
>
>Why do it this way :-D ? Are there some hidden advantage that I'm not
>thinking of (beside saving font space) ?
>
>It would seem more logical to simply store all those visual ("correct")
>glyphs with their appropriate encodings instead of again reverting to the
>visual re-mapping every time this file is opened -- and I'm not talking of
>saving visual hints and/or control characters. Here's a scenario -- let's
>assume I write a really long/large document in Arabic all the while the
>application is doing these conversions as I type (or maybe it
post-processes
>on a per paragraph basis or whatever) - I then save my document (currently
>all that visual conversion would be lost, right ?) and is stored on disk
>using only 0600-06FF encodings. Why not preserve all these conversions so
>that if someone wanted to read my 15MB :-) file they wouldn't have to wait
>for any more conversions to take place (its a waste of time and processor
>throughput) ? You see what I'm saying ? With that in mind, I was thinking
>that Form-B is an integral part of any unicode "Arabic" font since it needs
>to be known (and used) by everyone (well, the converter has to have these
>glyph from somewhere, right ?).

>> To fully support Unicode, a font format like OpenType is needed. An
OpenType
>> font can take a characters, like U+062A, realize it's in the medial form,
>> and display the appropriate glyph, without needing a Unicode character.

>I think I understand the concept of how this was supposed to work - but my
>comments/question above still stand.
>
>It just seems odd to go this way - its certainly cleaner to include all the
>characters and their various permutations and give the user the ability to
>decide what he wants to type and how he wants it to look; ensuring that
what he
>typed would be saved in exact-mode (what-you-see-is-what-you-store --
WYSIWYS
>:-) Granted that the application would still have to do this conversion
(or
>shaping), but its only done once -- upon creation. Moreover, this
conversion
>library would be universal given universal fonts and encodings (no optional
>anything). If this were to happen, it would give any application, given
the
>right set of fonts, the ability to display Arabic characters, no ? The
>person would be able to display (or read) a document, but wouldn't be able
>to modify it unless he had Bidi support and shaping.

[More system-specific questions cut]

--
David Starner - dstarner98@aasaa.ofe.org
"The pig -- belongs -- to _all_ mankind!" - Invader Zim



This archive was generated by hypermail 2.1.2 : Sat Aug 11 2001 - 04:17:19 EDT