Re: Non-ascii string processing?

From: jon@spin.ie
Date: Wed Oct 08 2003 - 08:23:23 CST


John Cowan <cowan@mercury.ccil.org> wrote :

> jon@spin.ie
> scripsit:
>
> > (The XML1.1 spec removes a few of those characters, I would have
> > removed more, but that's another issue).
>
> You have no idea what fearful drubbings I had to administer to get
> even the few removed that I did.

Well I have a general tendency towards being liberal in these matters (as I've said before allowing nonsense is *sometimes* a good way to ensure you allow edge cases) so I can see where objectors would be coming from.

> > [D]oes ISO 10646 allow those characters even though Unicode has them
> > undefined?
>
> No, it doesn't. There was a strong feeling in the W3C Core WG that
> it be possible to handle the Astral Planes uniformly; every character
> off the BMP, therefore, is a valid Char as well as a valid NameStartChar.

Hmm. To my mind that isn't uniform at all - someone familiar with Unicode would have already disallowed, say U+4FFFE, as a non-character before they got as far as the production (making it effectively excluded) where someone else relying on the XML spec for information about character properties would allow it.

Maybe CharMod will safe us all...



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST