Re: Granularity of Unicode Conformance

From: Mark E. Davis (markdavis@ispchannel.com)
Date: Tue Sep 28 1999 - 02:36:33 EDT

Next message: Kevin Bracey: "Re: ISO-8859-11 and Unicode"
Previous message: F. Avery Bishop (Exchange): "RE: Uniscribe for Win32 Unicoders"
Maybe in reply to: Hart, Edwin F.: "Granularity of Unicode Conformance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The checklist really only applies in very special cases. Take your example of
Arabic. In a product that does not display Arabic, but only processes it
internally (e.g. a server product), there is no need to even have the
bidirectional algorithm implemented. In a multilingual product that has no
spelling checkers for English, it is hardly a flaw to not have them for Arabic.
&c.

A more productive approach is to determine whether the same relative quality of
services are available for Arabic as are available for Latin characters. If the
product supports correct sorting for English or French, does it also for
Arabic? If it supports quality rendering or editing for English or French, does
it support the same level of quality for Arabic?

Of course, "relative quality of support" is something that is relative to the
script (and language), and may have very different technological demands. Take
rendering, for example:

- minimal-quality English typography doesn't even require proportional fonts;
- minimal-quality Arabic requires bidi plus contextual forms plus minimal
ligatures.

- high-quality English typography requires ligatures, kerning, perhaps even
hanging punctuation;
- high-quality Arabic need loads of ligatures, specialized contextual forms,
character stretching (e.g. snake-kaf replacement), and perhaps even descending
levels.

Similarly, an English spell-checker doesn't need to know much morphology to do
a good job; for a highly inflected language like German or Russian, a more
sophisticated technology is required to get the same relative quality of
service. On the other hand, reasonable sorting for Arabic is hardly more
complicated than for French.

As to your item on "Support for Arabic presentation forms", a product can be
perfectly conformant to Unicode, with excellent handling of Arabic, and not
support these at all. However, if it also performed character conversion from a
variety of legacy character sets from different platforms, then it should also
handle conversion of visual-order Arabic encodings.

Mark

"Hart, Edwin F." wrote:

> >From the perspective of a user of Unicode products, conformance to Unicode
> is very important. However, the degree to which a product implements the
> required Unicode functions and supports various scripts is an important
> topic not addressed by the Unicode conformance chapter.
>
> Yes, qualification by script is a better way to start analyzing Unicode
> "support".
>
> Since Arabic tends to be one of the more challenging scripts, let me take a
> quick look at what Unicode "support" of the Arabic script might entail. By
> the way, I am not an expert in Arabic, and I am writing this for your
> feedback and comments. Certainly, other approaches are possible.
>
> 1. Which subset of Arabic characters are included?
> 2. Spell checking
> a. Arabic combining characters
> 3. Grammar checking
> 4. Arabic sorting, which likely depends on the language (Arabic, Persian,
> Urdu, etc.)
> 5. Presentation imaging (printing & displaying)
> a. Arabic shaping (map character to correctly presentation glyph depending
> on context)
> b. Bidirectional algorithm for presentation
> c. Arabic combining characters
> d. Mirroring character shapes for presentation
> 1) Opening/closing
> 2) Mathematical symbols
> 6. Support for Arabic presentation forms
> (While you should not generate them, what do you do if you receive them?)
> a. Presentation
> b. Reverse mapping into "pure" Arabic characters for spell checking and
> other informational processing
>
> Ed Hart
>
> Edwin F. Hart
> edwin.hart@jhuapl.edu
> The Johns Hopkins University Applied Physics Laboratory
> 11100 Johns Hopkins Road
> Laurel, MD 20723-6099
> USA
> +1-443-778-6926 (Baltimore area)
> +1-240-228-6926 (Washington, DC area)
> +1-443-778-1093 (fax)
> +1-240-228-1093 (fax)

Next message: Kevin Bracey: "Re: ISO-8859-11 and Unicode"
Previous message: F. Avery Bishop (Exchange): "RE: Uniscribe for Win32 Unicoders"
Maybe in reply to: Hart, Edwin F.: "Granularity of Unicode Conformance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT