L2/04-331 Source: Mark Davis Subject: Alignment of newline between XML, TUS, Regex? Date: Wed, 4 Aug 2004 21:19:51 -0700 We received a report that for newline the XML 1.1 definition, the Unicode Standard definition, and the UTS Regex definition are all slightly different. Here is a comparison: All contain LF, CR, CRLF, NEL, LS XML: adds CRNEL (that is, the sequence CR+NEL) TUS: adds FF, VT, PS Reg: adds FF, PS Question: should we add CFNEL to TUS, and VT, CRNEL to Regex, so that the latter two are aligned, and are supersets of XML? =================== Background: XML: http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-line-ends 0. (single #xA character) 1. the two-character sequence #xD #xA 2. the two-character sequence #xD #x85 3. the single character #x85 4. the single character #x2028 5. any #xD character that is not immediately followed by #xA or #x85. TUS: http://www.unicode.org/versions/Unicode4.0.0/ch05.pdf#G10213 CR carriage return 000D LF line feed 000A CRLF carriage return and line feed 000D,000A NEL next line 0085 VT vertical tab 000B FF form feed 000C LS line separator 2028 PS paragraph separator 2029 Regex: http://www.unicode.org/reports/tr18/#Line_Boundaries \u000A | \u000C | \u000D | \u000D\u000A | \u0085 | \u2028 | \u2029