Line Break Chart

Line Break Chart L2/07-336

Unicode Version: 5.0.1

Date: 2007-04-26, 22:46:26 GMT

This page illustrates the application of the boundary specifications. The first chart shows where breaks would appear between different sample characters or strings. The sample characters are chosen mechanically to represent the different properties used by the specification. Where properties used in the rules have 'overlaps', the samples are given 'composed' names. For example, SentenceBreak uses GCLF_Sep: Sep is the SentenceBreak property, but it overlaps with the GraphemeClusterBreak property LF.

	AL	B2	BA	BB	BK	CB	CL	CM	CR	EX	GL	H2	H3	HY	ID	IN	IS	JL	JT	JV	LF	NL	NS	NU	OP	PO	PR	QU	SP	SY	WJ	ZW	AI_AL	XX_AL	SA_AL
AL	×	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	÷	÷	×	×	×	×	×	÷	÷	×	×	×	×	×	×	×	×
B2	÷	×	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	÷	×	÷	÷	÷	×	×	×	÷	÷	÷	÷	×	×	×	×	×	÷	÷	÷
BA	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	÷	×	÷	÷	÷	×	×	×	÷	÷	÷	÷	×	×	×	×	×	÷	÷	÷
BB	×	×	×	×	×	÷	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×
BK	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷
CB	÷	÷	÷	÷	×	÷	×	×	×	×	×	÷	÷	÷	÷	÷	×	÷	÷	÷	×	×	÷	÷	÷	÷	÷	×	×	×	×	×	÷	÷	÷
CL	×	÷	×	÷	×	÷	÷	×	×	×	×	÷	÷	×	÷	÷	÷	÷	÷	÷	×	×	×	×	÷	÷	÷	×	×	÷	×	×	×	×	×
CM	×	÷	×	÷	×	÷	÷	×	×	×	×	÷	÷	×	÷	×	÷	÷	÷	÷	×	×	×	×	×	÷	÷	×	×	÷	×	×	×	×	×
CR	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	×	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷
EX	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	÷	×	÷	÷	÷	×	×	×	÷	÷	÷	÷	×	×	×	×	×	÷	÷	÷
GL	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×
H2	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	×	×	×	÷	÷	÷
H3	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	×	÷	×	×	×	÷	÷	×	÷	×	×	×	×	×	÷	÷	÷
HY	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	÷	×	÷	÷	÷	×	×	×	×	÷	÷	÷	×	×	×	×	×	÷	÷	÷
ID	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	÷	÷	×	×	×	÷	÷	×	÷	×	×	×	×	×	÷	÷	÷
IN	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	÷	÷	×	×	×	÷	÷	÷	÷	×	×	×	×	×	÷	÷	÷
IS	×	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	÷	×	÷	÷	÷	×	×	×	÷	÷	÷	÷	×	×	×	×	×	×	×	×
JL	÷	÷	×	÷	×	÷	×	×	×	×	×	×	×	×	÷	×	×	×	÷	×	×	×	×	÷	÷	×	÷	×	×	×	×	×	÷	÷	÷
JT	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	×	÷	×	×	×	÷	÷	×	÷	×	×	×	×	×	÷	÷	÷
JV	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	×	×	×	÷	÷	÷
LF	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷
NL	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷
NS	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	÷	×	÷	÷	÷	×	×	×	÷	÷	÷	÷	×	×	×	×	×	÷	÷	÷
NU	×	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	÷	÷	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×
OP	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×
PO	×	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	÷	×	÷	÷	÷	×	×	×	×	÷	÷	÷	×	×	×	×	×	×	×	×
PR	×	÷	×	÷	×	÷	×	×	×	×	×	×	×	×	×	÷	×	×	×	×	×	×	×	×	÷	÷	÷	×	×	×	×	×	×	×	×
QU	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×
SP	÷	÷	÷	÷	×	÷	×	÷	×	×	÷	÷	÷	÷	÷	÷	×	÷	÷	÷	×	×	÷	÷	÷	÷	÷	÷	×	×	×	×	÷	÷	÷
SY	÷	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	÷	×	÷	÷	÷	×	×	×	÷	÷	÷	÷	×	×	×	×	×	÷	÷	÷
WJ	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×	×
ZW	÷	÷	÷	÷	×	÷	÷	÷	×	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	÷	×	×	÷	÷	÷	÷	÷	÷	×	÷	÷	×	÷	÷	÷
AI_AL	×	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	÷	÷	×	×	×	×	×	÷	÷	×	×	×	×	×	×	×	×
XX_AL	×	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	÷	÷	×	×	×	×	×	÷	÷	×	×	×	×	×	×	×	×
SA_AL	×	÷	×	÷	×	÷	×	×	×	×	×	÷	÷	×	÷	×	×	÷	÷	÷	×	×	×	×	×	÷	÷	×	×	×	×	×	×	×	×

Rules

Due to the way they have been mechanically processed for generation, the following rules do not match the UAX rules precisely. In particular:

The rules are cast into a more regex-style.
The rules "sot ÷", "÷ eot", and "÷ Any" are added mechanically, and have artificial numbers.
The rules are given decimal numbers, so rules such as 11a are given a number using tenths, such as 11.1.
Where a rule has multiple parts (lines), each one is numbered using hundredths, such as 21.01) × BA, 21.02) × HY,...
Any 'treat as' or 'ignore' rules are handled as discussed in Unicode Standard Annex #29, and thusreflected in a transformation of the rules not visible here.

For the original rules, see the UAX.

0.2) sot ÷
0.3) ÷ eot
4.0) BK ÷
5.01) CR × LF
5.02) CR ÷
5.03) LF ÷
5.04) NL ÷
6.0) × ( BK | CR | LF | NL )
7.01) × SP
7.02) × ZW
8.0) ZW ÷
9.0) [^SP BK CR LF NL ZW] × CM
11.01) × WJ
11.02) WJ ×
12.01) [^SP] × GL
12.02) GL ×
13.01) [^NU] × CL
13.02) × EX
13.03) [^NU] × IS
13.04) [^NU] × SY
14.0) OP SP* ×
15.0) QU SP* × OP
16.0) CL SP* × NS
17.0) B2 SP* × B2
18.0) SP ÷
19.01) × QU
19.02) QU ×
20.01) ÷ CB
20.02) CB ÷
21.01) × BA
21.02) × HY
21.03) × NS
21.04) BB ×
22.01) AL × IN
22.02) ID × IN
22.03) IN × IN
22.04) NU × IN
23.01) ID × PO
23.02) AL × NU
23.03) NU × AL
24.01) PR × ID
24.02) PR × AL
24.03) PO × AL
25.01) (PR | PO) × ( OP | HY )? NU
25.02) ( OP | HY ) × NU
25.03) NU × (NU | SY | IS)
25.04) NU (NU | SY | IS)* × (NU | SY | IS | CL)
25.05) NU (NU | SY | IS)* CL? × (PO | PR)
26.01) JL × JL | JV | H2 | H3
26.02) JV | H2 × JV | JT
26.03) JT | H3 × JT
27.01) JL | JV | JT | H2 | H3 × IN
27.02) JL | JV | JT | H2 | H3 × PO
27.03) PR × JL | JV | JT | H2 | H3
28.0) AL × AL
29.0) IS × AL
30.01) (AL | NU) × OP
30.02) CL × (AL | NU)
999.0) ÷ Any

Sample Strings

The following samples illustrate the application of the rules. The blue lines indicate possible break points. If your browser supports titles, then positioning the mouse over each character will show its name, white positioning between characters shows the rule number of the rule responsible for the break-status.

c a n ' t
c a n ’ t
a b □ b y
- 3
e . g .
一 . 一 .
a b
a □ b
a ◌̈ b
1 ◌̈ b ( a ) - ( b )