Re: Comments on UTR #18, rev 3

From: Mark Davis (
Date: Wed May 26 1999 - 11:46:25 EDT

Thanks for the comments.

----- Original Message -----
From: John Cowan <>
To: <>; <>
Sent: Tuesday, May 25, 1999 11:45 AM
Subject: Comments on UTR #18, rev 3

> Overall a good first cut (certainly well above what anyone
> else is publicly doing).
> The paragraph on blocks in 1.2 is unmotivated: either remove it
> (my preference) or develop it further, with examples.
> I think that regex matching on blocks is a bad idea anyway: it
> encourages people to believe that {CYRILLIC} represents a Cyrillic
> alphabet (it represents the union of all known Cyrillic alphabets), or
> that there is some meaningful difference between Latin-Extended-A
> (ISO 8859:2,3,4,9) and Latin-Extended-B.
> I think the distinction between Level 1 and Level 2 is really
> unnecessary, given that people can implement whichever features
> they like.

It is trying to give a sense of what a minimal consistent level of support
would be, and then steps up from there.
> Grapheme support is useless if there is no standard way to assign
> graphemes to equivalence classes, and the draft
> promulgates no rules for doing so.
> Surrogates are mentioned in the Introduction but omitted from level 2.

Good point.
> The whole of Level 3 should be rethought: it is a tangled mixture
> of pure regex issues and collation issues. I recommend it be
> withdrawn except for some motherhood warnings, as it is too vague
> to serve as the basis for implementation.

I'll review it. If it can't be fixed, it can be withdrawn.
> --
> John Cowan
> You tollerday donsk? N. You tolkatiff scowegian? Nn.
> You spigotty anglease? Nnn. You phonio saxo? Nnnn.
> Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT