Re: TR29 Word Break awkwardness

From: Peter Kirk (
Date: Tue Sep 14 2004 - 03:52:53 CDT

    On 13/09/2004 23:39, Andy Heninger wrote:

    > In looking at how the proposed changes to the TR 29 word boundary
    > rules would be implemented in the ICU library, I came across an odd
    > situation in the rules.
    > ...
    > While thinking about what to do about this, it struck me that it would
    > probably be more consistent all the way around to remove the Grapheme
    > Extend characters from the ALetter set. The only effect of this
    > change would be on the breaking behavior of combining characters with
    > no base character.
    > Any thoughts?
    Would the effect of this be to allow (in some cases) a word break
    immediately after a combining character with no base letter?

    I have in mind certain situations found in Hebrew (Ketiv/Qere blended
    forms) in which anomalous (but quite frequently found) word forms begins
    with a spacing combining character. The currently specified way of
    supporting this situation is to use SPACE or NBSP followed by the
    combining character (as these combining characters do not have
    non-spacing clones). It would be highly undesirable to make a change
    here which would allow word breaks, line breaks etc after the combining
    character but before the rest of the word.

    Public Review Issue #41 proposes that a new INVISIBLE LETTER be used
    instead of SPACE or NBSP to carry the combining character in such
    situations. Presumably, if this is accepted, the problem will go away
    once this new letter is in use at it has letter-like properties. But the
    existing usage with SPACE will continue to be found documents already
    existing now.

    Peter Kirk (personal) (work)

