Date: Mon, 15 Oct 2007 13:18:58 -0700
From: Mark Davis
Subject: Small change to UAX#29


I was looking over #29, and realized that there is an issue in SB11. With the latest change, we pulled out CR and LF explicitly, so we have:


SB11. ( STerm | ATerm ) Close* Sp* ( Sep | CR | LF )?  

At first, I thought this was a problem, because we do have CRLF, and so should include that, giving the modified:

SB11. ( STerm | ATerm ) Close* Sp* ( Sep | CR | LF | CR LF )?  

Then I realized that we don't need the final clause at all. We already have:

SB4. Sep | CR | LF  

So we will handle CRLF correctly anyway.

Therefore, the original SB11 (at the top here) actually works. However, it is conceptually muddied by the last clause: we could replace SB11 by the simpler:

SB11. ( STerm | ATerm ) Close* Sp*  

A small thing, but clearer, I think, so I'd recommend doing.

We should also add and informative note that some implementations may have mechanisms that allow them to forbid breaking within a sequence of characters.
In such a case, an implementation could boil rules 9, 10, and 11 down to a single rule: don't break within:
( STerm | ATerm ) Close* Sp* ( Sep | CR | LF )?