John's Own Version of Unicode Conformance, Version 4.0

From: John Cowan (cowan@mercury.ccil.org)
Date: Fri May 16 2003 - 21:48:46 EDT

  • Next message: Michael \(michka\) Kaplan: "Re: John's Own Version of Unicode Conformance, Version 4.0"

    John's Own Version of Unicode Conformance, Version 4.0 (draft 1)

    1 through 3 are history.

    4. Loose surrogates don't mean jack.

    5. Neither do non-characters.

    6. Leave the unassigned codepoints alone.

    7. It's okay to be ignorant about a character, but not wrong.

    8. Subsets are strictly up to you.

    9. Canonically equivalent sequences always mean the same thing.

    10. Don't garble what you don't understand.

    11. Interpret UTFs by the UTF rules.

    12. Generate them that way, too.

    12a. Garbled UTF is garbage.

    12b. Believe in the BOM.

    13. Do right-to-left characters by bidi rules.

    14. When you say you are producing normalized text, make it so.

    15. When you check for it, do that by the book too.

    16. Text normalizers have to pass the tests.

    17. When you talk about Unicode formally, do it the way we say.

    18. Provisional properties are just for us.

    19. Don't do bogus algorithm implementations.

    20. If you say you do casing, do it right.

    Comments?

    -- 
    John Cowan  jcowan@reutershealth.com  www.reutershealth.com  www.ccil.org/~cowan
    Consider the matter of Analytic Philosophy.  Dennett and Bennett are well-known.
    Dennett rarely or never cites Bennett, so Bennett rarely or never cites Dennett.
    There is also one Dummett.  By their works shall ye know them.  However, just as
    no trinities have fourth persons (Zeppo Marx notwithstanding), Bummett is hardly
    known by his works.  Indeed, Bummett does not exist.  It is part of the function
    of this and other e-mail messages, therefore, to do what they can to create him.
    


    This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 22:31:15 EDT