Re: Is there Unicode mail out there?

From: Shigemichi Yazawa (yazawa@globalsight.com)
Date: Mon Jul 16 2001 - 15:12:39 EDT


At Sat, 14 Jul 2001 09:49:30 -0700,
Mark Davis <mark@macchiato.com> wrote:
>
> No, but it is for the vast majority.
>
> Some have to be written specially, e.g. &lt;

I looked at XML 1.0 spec and it says in 2.4 Character Data and Markup
that

"If they are needed elsewhere, they must be escaped using either
numeric character references or the strings "&amp;" and "&lt;"
respectively."

I also looked at HTML 4.01 spec and it doesn't say in 5.3.2 Character
entity references that &#60; cannot be used to represent "<".

> Some cannot be written at all, e.g. U+0007 (but U+0087 can be!)

This is true for XML, but I couldn't find any statement in HTML 4.01
spec to restrict the use of U+0007 in HTML document.

By the way, I have been pondering why, in XML, all the C1 control
characters are legal but some of the C0 control characters are
not. 2.2 Characters says that "Legal characters are tab, carriage
return, line feed, and the legal characters of Unicode and ISO/IEC
10646." and the BNF for Char is this.

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | /* any Unicode character,
             [#xE000-#xFFFD] | [#x10000-#x10FFFF] excluding the surrogate blocks,
                                                  FFFE, and FFFF. */

Does this mean C0 controls are not legal Unicode characters?

-------------------
Shigemichi Yazawa
yazawa@globalsight.com



This archive was generated by hypermail 2.1.2 : Mon Jul 16 2001 - 16:07:12 EDT