RE: Questionable lines on LineBreakTest.txt

From: Laurentiu Iancu (liancu@microsoft.com)
Date: Tue Jun 08 2010 - 12:39:26 CDT

  • Next message: Sarasvati: "Re: Hexadecimal digits"

    The incorrect presence of break opportunities at SOT in LineBreakTest.txt is a known issue, documented in the erratum dated 2008-April-28 at http://www.unicode.org/errata/. The correct result at SOT is a no-break, in accordance to rule LB2.

    Regards,
    L.

    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Konstantin Ritt
    Sent: Tuesday, June 08, 2010 8:10 AM
    To: Mark Davis ☕
    Cc: Asmus Freytag; Masaaki Shibata; unicode@unicode.org
    Subject: Re: Questionable lines on LineBreakTest.txt

    2010/6/8 Mark Davis ☕ <mark@macchiato.com<mailto:mark@macchiato.com>>

    > If the test files are "known to be in error", then those "known" cases need to be actually communicated back to the UTC; sitting on them doesn't do anyone any good.
    >
    > I have not had a chance to investigate, but this particular case may be covered by the description in http://unicode.org/Public/6.0.0/ucd/auxiliary/LineBreakTest-6.0.0d4.html:
    >
    > The Line Break tests use tailoring of numbers described in Example 7 of Section 8.2 Examples of Customization.


    indeed.
    LB24 says: The default line breaking algorithm approximates this with the following (LB25) rule. Note that some cases have already been handled, such as ‘9,’, ‘[9’. For a tailoring that supports the regular expression directly, as well as a key to the notation see Section 8.2, Examples of Customization.

    and there is a note in LineBreakTest*.txt file: Note: The Line Break tests use tailoring of numbers described in Example 7 of Section 8.2 Examples of Customization. They also differ from the results produced by a pair table implementation in sequences like: ZW SP CL.


    but I have yet another question: why every test in LineBreakTest.txt assumes break opportunity at the start-of-text while LB2 says: Never break at the start of text ? if these tests are for "out of context" usage, where can i read such note?

    Konstantin



    This archive was generated by hypermail 2.1.5 : Tue Jun 08 2010 - 12:42:27 CDT