Checklists of Unicode Support (was Granularity of Unicode Confor mance)

From: Hart, Edwin F. (Edwin.Hart@jhuapl.edu)
Date: Tue Sep 28 1999 - 09:18:02 EDT


Peter Constable wrote:

There's a lot that we might want to know about all of these apps. The
biggest question, though, is whether any one of us is willing to start
compiling all of this info?
Peter

As a representative of users of products that support Unicode, I have raised
the issue at UTC meetings. Given the complexity of the issues, the UTC
voted not to address this issue but said it would be willing to review the
materials submitted to it. Given the nature of volunteer organizations, I
have taken the task to try to make some sense of it all.

However, I did not know where to begin. Jonathan Rosenne's note reminded me
of the task. The only strategy that I had was to pick one of the more
difficult scripts, like Arabic, draft a checklist for it, and then use this
as template to build checklists for other scripts. We will likely find that
several of the items (keyboard input) on the checklist are common across
scripts but that others (bidi) are unique to a script or subset of scripts.
However, users need to consider 2 orthogonal checklists:

1. a list of the Unicode functions and resources (e.g., fonts) needed to
fully support a script, and
2. a list of those functions and components needed by a particular product
to support a script.

As Mark Davis and others have stated, a database engine can correctly
support Arabic without handling keyboard input and presentation, but it had
better support sorting Arabic characters. My thought is that if we can
fully define the first list by script, then generating the second list would
be relatively easy.

Since I'm not a linguist, I'm going to need a lot of help to generate the
checklists for each script. If you are interested in helping, send me a
checklist of functions and resources needed to support a particular script
or set of scripts.

Here is an updated checklist for the Arabic script based on the feedback to
date:

1. Which scripts are supported?
2. Which languages are supported under each script?
a. Spell checking
1) Arabic combining characters
b. Grammar checking
c. Sorting, which likely depends on the language (Arabic, Persian, Urdu,
etc.)
3. Which subset of Arabic characters are included?
4. Which keyboards and input methods are supported?
5. What is the quality of the presentation image (printing & displaying)
a. What is the quality of the presentation?
1) Newspaper quality
2) Book quality
3) Koran quality
b. Which functions are supported?
1) Ligatures
2) Stretching and compressing glyphs
3) Arabic shaping (map character to correctly presentation glyph depending
on context)
4) Bidirectional algorithm for presentation
5) Mirroring character shapes for presentation
a) Opening/closing symbols
b) Mathematical symbols
c. Which resources are supported or included?
1) Arabic combining characters
2) Which fonts, if any, are included with the product and what is the glyph
coverage?
6. How are Arabic presentation forms supported?
(While you should not generate them, what do you do if you receive them?)
a. Presentation
b. Reverse mapping into "pure" Arabic characters for spell checking and
other informational processing

Ed Hart

Edwin F. Hart
edwin.hart@jhuapl.edu
The Johns Hopkins University Applied Physics Laboratory
11100 Johns Hopkins Road
Laurel, MD 20723-6099
USA
+1-443-778-6926 (Baltimore area)
+1-240-228-6926 (Washington, DC area)
+1-443-778-1093 (fax)
+1-240-228-1093 (fax)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT