Unicode Conformance

From: David Craig (doc@ElSegundoCA.NCR.COM)
Date: Wed Oct 19 1994 - 12:46:58 EDT


We have the following questions related to Unicode Conformance:

In our environment we have a Unicode processing engine which is a relational
DBMS. It is primarily concerned with the correct comparison and collation of
text elements, but provides no glyph processing capability. Unicode is used
as a canonical representation in the backing store and also as an external
character set to the DBMS.

Now, what exactly does it mean for this processing engine to provide level 1
support (i.e. from ISO 10646 this is precomposed characters only) of Unicode?
More specifically, what does one do with the following code elements which
define levels 2 and 3 semantics and other text rendering directives? The
standard (vol 1, p. 24) states that the behavior of processes in this
situation is unspecified.

        1) non-spacing marks. We are aware of the following options. Does one:

                a) Disallow these code elements into the system by aborting
                the entire request? Are non-spacing marks widely used?
                Are they commonly generated by Windows NT?

                b) Allow the character into the system? Is the code element
                then ignored, as if it did not exist, during comparison,
                searching and collation? For example, would Uppercase E
                followed by a non-spacing grave be equal to Uppercase E?

                c) Convert the base element and non-spacing mark to an
                existing precomposed character (if one exists)? Since the
                meaning does not change, is this a conformant behavior?
                                
        2) joiners, non-joiners and directional overrides. Does one:

                a) Disallow these code elements into the system by aborting
                the entire request?

                b) Allow the character into the system? Is the code element
                then ignored, as if it did not exist, during comparison,
                searching and collation?

        3) Control characters, undefined and reserved regions.

                The same options as in 2 would apply.

We would appreciate your insights and experiences related to these conformance
issues.

+-------------+------------------------------------+-------------------------+
| AT&T | David O. Craig | Phone: (310) 524-7769 |
| Global | Internationalization Group | Fax: (310) 524-5517 |
| Information | Teradata Decision Enabling Systems | Office: 17-144 |
| Solutions | 100 N. Sepulveda Blvd. | doc@elsegundoca.ncr.com |
| | El Segundo, Ca. 90245 | |
+-------------+------------------------------------+-------------------------+



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT