RE: Public Review Issues update: UAX #31

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Sep 25 2007 - 16:26:17 CDT

  • Next message: Philippe Verdy: "RE: New Public Review Issue: Proposed Update UTS #18"

    In fact after rereading this document carefully, I see why you wanted to
    change something here for sentence breaks; the initial document did not
    contain anything saying explicitly that the CRLF sequence was followed by a
    break.

    But with the proposed update, this does not change anything. There's
    currently no need to exclude CR and LF from the "Sep" class, and all changes
    in the rules starting at SB4 where Sep occurs are not changing anything
    (given that the rule that excludes a break between CR and LF is SB3, and
    prior rules SB1 and SB2 (start and end of text) have no impact on CR, LF and
    "Sep".

    Really I can't see any rationale for this change, except adding to the
    confusion of users with different versions of this document. That's why I
    suspect that something was forgotten, but I really wonder what this can be!
    For me the rules in SB3 and SB4 and enough and do not require any further
    change in the following rules where "Sep" occurs, or any exclusion of CR and
    LF of the "Sep" class.

    > -----Message d'origine-----
    > De : Philippe Verdy [mailto:verdy_p@wanadoo.fr]
    > Envoyé : mardi 25 septembre 2007 23:10
    > À : 'Rick McGowan'; 'unicode@unicode.org'
    > Objet : RE: Public Review Issues update: UAX #31
    >
    > Rick McGowan wrote:
    > > There is also a draft 2 version of the proposed update of UAX#29:Text
    > > Boundaries. This update addsCR, LF, Extend, and Control as needed,
    > > clarifies use of "Any" , updates MidLetter to include U+2018, andadds a
    > > new
    > > kind of grapheme cluster: extended combining character sequences. See
    > > http://www.unicode.org/reports/tr29/tr29-12.html
    >
    > This update removes CR and LF as part of the "Sep" class (in Sentence
    > boundaries, i.e. Table 4 for Sentence_Break Property values), but when I
    > look at the document, I don't see any place where "Sep" is not accepted
    > along with CR and LF, so we see now "Sep | CR |LF" in many "SB*" rules.
    >
    > What is the interest of this exclusion? It does not seem to change
    > anything to the intended result (and it does not change the existing rules
    > regarding the non-breakable sequence CR followed by LF).
    >
    > Did you forget something in the document where an accepted instance of
    > "Sep" in some rule would not match either CR or LF? Or is it made for
    > clarity (I'm not sure that this changes really clarifies anything)?



    This archive was generated by hypermail 2.1.5 : Tue Sep 25 2007 - 16:28:51 CDT