RE: Unicode Conformance (was: Re: News of AFII...)

From: F. Avery Bishop (
Date: Wed Dec 23 1998 - 17:57:51 EST

I found this conformance clause lacking when I was a more active participant
in the consortium, because an application or system can subset to any
degree, and say it "supports Unicode". Ignoring the fact that the empty set
is also a subset, what if an app claims Unicode support because it processes
"A" (U+0041) correctly, but ignores every other code point? Worse still, an
application that does nothing to any Unicode character, including display,
could claim Unicode support because it handles one of the BiDi controls,
e.g., LRM (U+200E) correctly, and nothing else! Since LRM has no visible
form (glyph), and the app doesn't recognize any other Unicode character,
doing nothing to any Unicode character means it can claim to "support
Unicode" subsetted to LRM.

Pointing out a problem is of little help unless it is accompanied by a
proposed solution. Alas, I have none. However, over the years I've become
convinced that this is less of a problem than I thought in my excessively
zealous youth. For one thing, the examples above are pathological and
unrealistic. If a SW product were to actually claim Unicode support under
such circumstances the market would judge its marketers to be incompetent at
best, and dishonest at worst. Even claiming Unicode support by handling a
more realistic subset, say only the Latin-1 subset of Unicode by null
extension, would bring derision on a real product, and users would be less
likely to use it than if no claims were made.

F. Avery Bishop
Program Manager, Multilingual Developer Communications

> -----Original Message-----
> From: John Cowan []
> Sent: Wednesday, December 23, 1998 6:33 AM
> To: Unicode List
> Subject: Re: Unicode Conformance (was: Re: News of AFII...)
> Kenneth Whistler wrote:
> > It is Chapter 3 which contains the normative definition of conformance
> > to the standard.
> Here's John's Own Version Of Unicode Compliance:
> 1) Unicode characters are 16 bits long; deal with it.
> 2) Byte order is only an issue in files.
> 3) If you don't have a clue, assume big-endian.
> 4) Loose surrogates don't mean jack.
> 5) Neither do U+FFFE and U+FFFF (a.k.a. the zigamorph).
> 6) Leave the unassigned codepoints alone.
> 7) It's OK to be ignorant about a character, but not plain wrong.
> 8) Subsets are strictly up to you.
> 9) Canonical equivalence matters.
> 10) Don't garble what you don't understand.
> This is presented in the hope that it may be useful, but all
> warranties (including implicit warranties of merchantability or
> fitness for a particular purpose) are void. Freely reusable,
> except that John Cowan asserts the moral right to be known as author.
> --
> John Cowan
> You tollerday donsk? N. You tolkatiff scowegian? Nn.
> You spigotty anglease? Nnn. You phonio saxo? Nnnn.
> Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT