Unicode FAQ addendum

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Jul 19 2000 - 12:03:19 EDT


The new Unicode FAQ (like the old) supplies the panting world with
John's Own Version of Unicode Conformance:

1) Unicode code units are 16 bits long; deal with it.
2) Byte order is only an issue in files.
3) If you don't have a clue, assume big-endian.
4) Loose surrogates don't mean jack.
5) Neither do U+FFFE and U+FFFF.
6) Leave the unassigned codepoints alone.
7) It's OK to be ignorant about a character, but not plain wrong.
8) Subsets are strictly up to you.
9) Canonical equivalence matters.
10) Don't garble what you don't understand.

But for 3.0 I will add:

11) Process UTF-* by the book.
12) Treat bogus encodings as junk.
13) Right-to-left scripts have to go by bidi rules.

These conformance sentences match up one-for-one with the conformance
clauses in Chapter 3 (TUS3.0, pp. 37-39).

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT