Re: Question about the Sentence_Break property

From: Konstantin Ritt <>
Date: Sat, 21 Feb 2015 06:50:53 +0400

When UAX9 mentions a paragraph level, it says:

> Paragraphs are divided by the Paragraph Separator or appropriate Newline
Function (for guidelines on the handling of CR, LF, and CRLF, see *Section
4.4, Directionality*, and *Section 5.8, Newline Guidelines* of [Unicode
<>]). Paragraphs
may also be determined by higher-level protocols: for example, the text in
two different cells of a table will be in different paragraphs.


2015-02-21 3:56 GMT+04:00 Philippe Verdy <>:

> 2015-02-20 6:14 GMT+01:00 Richard Wordingham <
>> TUS has a whole section on the issue, namely TUS 7.0.0 Section 5.8.
>> One thing that is missing is mention of the convention that a single
>> newline character (or CRLF pair) is a line break whereas a doubled
>> newline character denotes a paragraph break.
> In that case CR or LF characters alone are not "paragraph separators" by
> themselves unless they are grouped together. Like NEL, they should just be
> considered as line separators and the terminology used in UAX 29 rule SB4
> is effectively incorrect if what matters here is just the linebreak
> property. And also in that case, the SB4 rule should effecticely include
> NEL (from the C1 subset).
> But as SB4 is only related to sentence breaking, It would be e problem
> because simple linebreaks are used extremely frequently in the middle of
> sentences.
> What the Sentence break algorithm should say is that there should first be
> a preprossing step separating line breaks and paragraph breaks (creating
> custom entities,(similar to collation elements, but encoded internally with
> a code point out of the standard space), that the rule SB4 would use
> instead of "Sep | CR | LF". That custome entity should be "Sep" but without
> the rule defining it, as there are various ways to represent paragraph
> breaks.
> _______________________________________________
> Unicode mailing list

Unicode mailing list
Received on Fri Feb 20 2015 - 20:52:57 CST

This archive was generated by hypermail 2.2.0 : Fri Feb 20 2015 - 20:52:58 CST