Problem in Line breaking

From: Satoshi Nakagawa (snakagawa@infoteria.co.jp)
Date: Sat Feb 23 2008 - 13:48:18 CST

  • Next message: Jeroen Ruigrok van der Werven: "Re: Problem in Line breaking"

    Hi,

    I found a problem in the Unicode line breaking algorithm.

    In Japanese writing, [こたえは、answer] should be breakable into
    lines like:

         こたえは、
         answer

    Because [、](U+3001) and [。](U+3002) in Japanese are used just like
    comma and period in English. We can break a line after comma or
    period in English.

    But the current Unicode line breaking algorithm doesn't allow this
    behavior for (U+3001) and (U+3002).

    I think it's a problem of the Unicode line breaking algorithm.
    See http://www.unicode.org/reports/tr14/ .

    > CL: Closing Punctuation (XB)
    >
    > 3001..3002 IDEOGRAPHIC COMMA..IDEOGRAPHIC FULL STOP

    (U+3001) and (U+3002) are specified as CL.

    > LB30
    > Do not break between letters, numbers, or ordinary symbols and
    > opening or closing punctuation.
    >
    > CL × (AL | NU)

    It says CL and a subsequent alphabetic or numeric token is not
    breakable. In the result, we cannot do line breaking in any positions
    of [は、answer].

    IMHO, (U+3001) and (U+3002) should not be treated as CL. Because we
    cannot apply LB30 to them. They should be separated as a different
    class.

    What do you think?

    --
    Satoshi Nakagawa
    


    This archive was generated by hypermail 2.1.5 : Sun Feb 24 2008 - 12:08:34 CST