Comments on UTR #18, rev 3

From: John Cowan (cowan@locke.ccil.org)
Date: Tue May 25 1999 - 14:45:25 EDT


Overall a good first cut (certainly well above what anyone
else is publicly doing).

The paragraph on blocks in 1.2 is unmotivated: either remove it
(my preference) or develop it further, with examples.

I think that regex matching on blocks is a bad idea anyway: it
encourages people to believe that {CYRILLIC} represents a Cyrillic
alphabet (it represents the union of all known Cyrillic alphabets), or
that there is some meaningful difference between Latin-Extended-A
(ISO 8859:2,3,4,9) and Latin-Extended-B.

I think the distinction between Level 1 and Level 2 is really
unnecessary, given that people can implement whichever features
they like.

Grapheme support is useless if there is no standard way to assign
graphemes to equivalence classes, and the draft
promulgates no rules for doing so.

Surrogates are mentioned in the Introduction but omitted from level 2.

The whole of Level 3 should be rethought: it is a tangled mixture
of pure regex issues and collation issues. I recommend it be
withdrawn except for some motherhood warnings, as it is too vague
to serve as the basis for implementation.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT