From: Peter Kirk (email@example.com)
Date: Tue Sep 14 2004 - 03:52:53 CDT
On 13/09/2004 23:39, Andy Heninger wrote:
> In looking at how the proposed changes to the TR 29 word boundary
> rules would be implemented in the ICU library, I came across an odd
> situation in the rules.
> While thinking about what to do about this, it struck me that it would
> probably be more consistent all the way around to remove the Grapheme
> Extend characters from the ALetter set. The only effect of this
> change would be on the breaking behavior of combining characters with
> no base character.
> Any thoughts?
Would the effect of this be to allow (in some cases) a word break
immediately after a combining character with no base letter?
I have in mind certain situations found in Hebrew (Ketiv/Qere blended
forms) in which anomalous (but quite frequently found) word forms begins
with a spacing combining character. The currently specified way of
supporting this situation is to use SPACE or NBSP followed by the
combining character (as these combining characters do not have
non-spacing clones). It would be highly undesirable to make a change
here which would allow word breaks, line breaks etc after the combining
character but before the rest of the word.
Public Review Issue #41 proposes that a new INVISIBLE LETTER be used
instead of SPACE or NBSP to carry the combining character in such
situations. Presumably, if this is accepted, the problem will go away
once this new letter is in use at it has letter-like properties. But the
existing usage with SPACE will continue to be found documents already
-- Peter Kirk firstname.lastname@example.org (personal) email@example.com (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Tue Sep 14 2004 - 10:14:06 CDT