Re: Making sure I read UAX29 correctly

From: Mark Davis ⌛ (
Date: Thu Aug 27 2009 - 19:44:11 CDT

  • Next message: Mark Davis ⌛: "Re: Request clarification on disunification based on different character properties"

    The key is the statement "*Ignore Format and Extend characters, except when
    they appear at the beginning of a region of text. (See Section 6.2, Replacing
    Ignore Rules<>

    If you do follow the link to 6.2, it explicitly says:
    ** Replace the “Ignore” rule by the following, to disallow breaks within
    sequences (except after CRLF and related characters):* [My bolding]
    *So it is not meant to apply to the sequence CR Extend.*

    *The phrase *"beginning of a region of text"* should be clarified, because
    it is intended not to span a ÷ boundary introduced by previous rules (in
    this case, S1-S3). I'll run this by the ed committee to see if an editorial
    change is in order.


    (I think Eric M might have been using an earlier version, because his
    numbering was off. I'm looking at

    On Thu, Aug 27, 2009 at 17:09, Eric Muller <> wrote:

    > Mark Davis ⌛ wrote:
    >> Because SB5 doesn't match at the start, you keep on going through the
    >> rules, and finally end up not breaking.
    > You are saying that SB11 does not apply on <lower lower aterm close lf ext
    > ? upper lower lower> to use your example (with ? being the position
    > examined).
    > I just don't see how that can be inferred from the text. The rules in 6.2
    > that describe the insertion of "(Extend|Format)*" do not say "insert only in
    > some cases and not in others".
    > On the other hand, looking at the rules in the CLDR (for root), I see that
    > the variables Sep, CR and LF are not extended with $FE* like all the others.
    > Hence my sudden doubt.
    > I agree that the wording is not as clear as it should be.
    > Especially if it doesn't say what you want it to say ;-)
    > Thanks,
    > Eric.

    This archive was generated by hypermail 2.1.5 : Thu Aug 27 2009 - 19:46:11 CDT