Since I seem to be in a put up or shut up situation regarding
my concerns about Mark's draft regarding conformance, I thought
I should start putting up.

The following is the beginning of a draft for what *I* think
should be developed as a UTR on the Unicode Conformance Model,
*before* we rush off creating a bunch of conformance tests
(which might end up in compliance tests and procurement specs).

I'm not submitting this yet as a PDUTR, since it is still very
"drafty" and preliminary, and the last 2/3 is just outlined as
yet. But I would appreciate any feedback regarding the general
approach and direction I'm taking here. Would this kind of document
be useful? Are there things you'd like to see specifically addressed
in it?

--Ken

======================= draft draft ===========================

A Unicode Conformance Model

I. Introduction

The Unicode Standard is a very large and complex standard.
Because of this, and because of the nature of the material
in the standard, it is often rather difficult to determine,
in any particular case, just exactly what conformance to
the Unicode Standard means. People have raised issues regarding
this difficulty, both from a theoretical point of view, and
from the practical standpoint of determining what products
"support" Unicode, and what such claims of support actually
mean.

In an effort to fill this gap, this Unicode Conformance
Model has been developed. It aims at explaining what conformance
means for the Unicode Standard. It defines terminology regarding
the topic of conformance, specifies different areas and levels
of conformance, and describes what it means to make a claim
of conformance or "support" of the standard.

This model is not, in itself, a framework for compliance testing,
although it could be used to develop such a framework, should that
prove desireable.

II. Terminology

This section gives a basic introduction to the terminology that
will be discussed in more detail in sections below.

Conformance

In the context of formal standard, conformance refers to a set
of rules or criteria whereby a relevant entity (element of
information interchange, device, application, piece of
hardware, etc., etc.) can be determined to either be meeting
or not meeting the specification in the standard.

In general, a formal standard will have a conformance clause
or clauses, which will be stated in terms of conditionals
("X is in conformance with Y specification of this standard if Z")
or modals ("An X that conforms with Y specification of this
standard SHALL Z"). The modal verbs that standards language
generally associates with such statements may themselves be carefully
defined, and typically involve specialized usage of "SHALL" and
"MUST", to avoid any ambiguities of interpretation.

If a standard is complex, the conformance clause or clauses
themselves may also be complex. But on occasion, a conformance
clause may simply be stated along the lines of "X is in conformance
with this standard if it follows the specification in section W",
where section W may consist of hundreds of pages and constitute
most of the rest of the standard.

Normative/Informative

Formal standards often distinguish between normative and informative
content. This distinction may be highly conventionalized, or even be
subject to rules specified in other standards, as for ISO standards,
or the distinction may be much less formally maintained.

Normative content of a standard is that which is required for
all of the conformance requirements to be meaningful. Typically
a standard will have normative definitions for terms used in
the rest of the specification, will have normative references to
other standards or sources whose content is referred to
indirectly, and will have normative clauses, specifications,
or sections, which actually define the content of the standard
itself -- that which the conformance clauses apply to.

Informative content of a standard is that material which has been
added for clarification, but which, in the judgement of the standard's
maintainers, could in principle be omitted without materially
affecting the specification which the conformance clauses
refer to.

If a standard is changed over time, the status of some particular
content could change from informative to normative, or vice
versa, depending on whether it was newly required for conformance
or became unrequired for conformance.

Compliance

The term compliance is often used synonymously with the term
conformance. However, it is possible to draw a meaningful
distinction.

In the context of the Unicode Conformance Model, compliance is
used to mean an external determination that a particular relevant
entity actually does meet one or more conditions of the conformance
clauses of the standard. Thus while conformance is merely a logical
statement of requirements, compliance is a state met when entity
X is actually determined, under some specified set of circumstances,
to meet the logical statement of requirements. As such, conformance
clauses exist in the standard on their own, but compliance determination
implies the existence of compliance tests, applied to entities to
make such determinations.

A conformance claim can simply be stated. It is an assertion that
entity X meets a requirement of the standard.

A compliance claim, on the other hand, is the result of the specific
application of a test designed to determine the validity of a
conformance claim. Such tests are called compliance tests.

Conformance Tests and Compliance Tests

A standard may include tests or "benchmarks" as part of the text of 
the standard, or as external documents associated with the standard.
Once again, while there is some overlap in general usage of the
terms "conformance test" and "compliance test", in the Unicode
Conformance Model a systematic distinction is drawn between the
two.

A conformance test for the Unicode Standard is a list of data
certified by the UTC to be "correct" in regard to some particular
requirement for conformance to the standard. In some instances,
as for example, the implementation of the bidirectional algorithm,
producing a definitive list of correct results is difficult or
impossible, and in such cases, a conformance test may itself consist
of an implemented algorithm certified by the UTC to produce correct
results for any pertinent input data. Conformance tests for the
Unicode Standard are essentially benchmarks that someone can use
to determine if their algorithm, API, etc., claiming to conform
to some requirement of the standard, does in fact match the data
that the UTC claims defines such conformance.

A compliance test for the Unicode Standard, on the other hand,
is a test, usually designed and implemented by a third party
not associated with the Unicode Standard or the UTC, intended to
test a product which claims conformance to one or more aspects
of the Unicode Standard, for actual compliance to the standard.
Thus a compliance test is a test *of a product*. A compliance test,
may, of course, make use of one or more of the Unicode
conformance tests in order to determine the results of its test
of compliance.

Support

The term support, in the context of the Unicode Conformance Model,
refers to a more generalized claim of intent to conform to one
or another requirement of the standard. A claim of Unicode support
may in fact be difficult to verify, since it can be and often is
vague in detail. But in principle, at least, it indicates that the
developer or user of an entity intends conformance.

More specifically, support often refers to a claim of particular
repertoire coverage. For example, an application may claim support
for Unicode Greek. That should be interpreted as meaning that
Unicode Greek characters will be handled conformantly with the
standard, and furthermore that all other relevant aspects of processing
of those characters which that particular application is concerned with,
will also be done in such a way as not to violate conformance clauses
of the standard. 

Stability and Invariance

Some formal standards are developed once and then are essentially
frozen and stable forever. For such standards, stability of content
and the corresponding stability of conformance claims is not an
issue.

For a large, complex standard aimed at the universal encoding of
characters, such as the Unicode Standard, such stability is not
possible. The standard is necessarily evolving and expanding over
time, to extend its coverage of all the writing systems of the world.
And as experience in its implementation accumulates, further aspects
of character processing also accrue to the formal content of the
standard. This fundamentally dynamic quality of the Unicode Standard
complicates issues of conformance, since the content to which conformance
requirements pertain continually expands, both horizontally to more 
characters and scripts, and vertically to more aspects of character 
processing.

Invariance refers to those aspects of the content of the Unicode
Standard that have been determined to be unchangeable, even as
the standard continues its dynamic development. A fairly trivial
example can be seen in the guarantee of the stability of the
formal Unicode character names. While in principle such names *could*
be changed, and in very early versions of the standard were changed
(between Version 1.0 and Version 1.1, for example), the UTC has
determined that such changes are too disruptive and have too little
benefit to be tolerated. Accordingly, the stability of character names
has been promoted to the status of an invariant in the standard.

Conformance claims need to be distinguished in terms of their
relationship to invariants and non-invariants in the standard,
because of their different risk levels for stability.

Versions

The Unicode Standard is regularly versioned, as new characters
are added. A formal system of versioning is in place, involving
major, minor, and update versions, all with carefully controlled
rules for the type of documentation required, handling of the
associated data files, and allowable types of change between
versions. For more information about the details of Unicode
versioning see [link].

Conformance claims clearly must be specific to versions of the
Unicode Standard, but the level of specificity needed for a claim
may vary according to the nature of the particular conformance
claim being made.

[ The following content is just sketched out in outline form. ]

III. Structure of Unicode Conformance

This section will serve as a guide to unravelling the particular
way that the Unicode Standard expresses conformance requirements,
both in terms of where they are located and how they are expressed.

It also explores the peculiar aspects of conformance related to
the synchronized status of the Unicode Standard and the independent
but closely aligned International Standard ISO/IEC 10646, which
has its own conformance clauses expressed using ISO conventions.

Definitions
Conformance Clauses
Unicode Standard Annexes
Identification of Normative Content
Relation to 10646 Conformance

IV. Areas of Conformance

[Borrowing from Asmus' suggestions:]

1) representation

Representation would cover being able to express and transmit Unicode data, 
it would be a requirement applicable to certain protocols (e.g. XML), but might 
apply to the storage aspects of databases as well.

This would also apply to correct use of encoding forms and encoding schemes.

2) transcoding

Transcoding between Unicode and legacy (all other) character encodings.

3) string processing

String processing would 
generically cover all operations on Unicode texts that can be carried out 
without considering layout and specifically not considering fonts.

4) text layout, including display and selection

Layout would comprise all operations that go from backing store to displayed text 
(and the reverse, for selection). These operations are dependent on font 
data.

5) fonts

Primarily refers to CMAP's for fonts, and to claims of "coverage" of Unicode
repertoire by fonts.

6) input

Issues of coverage of Unicode repertoire, conversion of input to Unicode
character values for storage, and consistency with the text models
required for particular scripts and text layout. The entities here are
mostly IME's and keyboards (drivers).

V. Levels of Conformance

This section will provide both a typology for levels of conformance
(i.e., an alternative to the notion that all aspects of Unicode
conformance are either/or issues), and specific lists of levels of
conformance and support where they can be pulled out of the standard.
For example, the standard explicitly talks about levels of surrogate
support -- that should be abstracted, along with others, to
provide the basis for determining how to make various claims of
conformance.

Repertoire coverage
Full conformance (in an area)
Partial conformance (in an area)
  - levels of support defined
Best practices

VI. Interoperability

Matching areas and levels of conformance between implementations
and components.

Repertoire matching.

Downrev and uprev compatibility issues.