L2/07-383

Date: Mon, 15 Oct 2007 13:18:58 -0700
From: Mark Davis
Subject: Small change to UAX#29


======

I was looking over #29, and realized that there is an issue in SB11. With the latest change, we pulled out CR and LF explicitly, so we have:

 

SB11. ( STerm | ATerm ) Close* Sp* ( Sep | CR | LF )? ÷  

At first, I thought this was a problem, because we do have CRLF, and so should include that, giving the modified:

 
SB11. ( STerm | ATerm ) Close* Sp* ( Sep | CR | LF | CR LF )? ÷  

Then I realized that we don't need the final clause at all. We already have:

 
SB3. CR × LF
SB4. Sep | CR | LF ÷  

So we will handle CRLF correctly anyway.

Therefore, the original SB11 (at the top here) actually works. However, it is conceptually muddied by the last clause: we could replace SB11 by the simpler:

 
SB11. ( STerm | ATerm ) Close* Sp* ÷  

A small thing, but clearer, I think, so I'd recommend doing.

We should also add and informative note that some implementations may have mechanisms that allow them to forbid breaking within a sequence of characters.
In such a case, an implementation could boil rules 9, 10, and 11 down to a single rule: don't break within:
 
( STerm | ATerm ) Close* Sp* ( Sep | CR | LF )? ÷
 


--
Mark