RE: Japanese line breaks (was: interleaved ordering)

From: Han-Yi Shaw (hanyis@microsoft.com)
Date: Mon May 10 2004 - 22:20:00 CDT

  • Next message: Peter Kirk: "Re: Phoenician"

    Microsoft Office (Win and Mac) applications ensure that the line breaking is correct for East Asian Text. For example, in Microsoft Word, under Options | Asian Typography | First and Last Characters, you will find the following options for Japanese:

     

    Cannot Start Line with: !%),.:;?]}¢°’”‰′″℃、。々〉》」』】〕゛゜ゝゞ・ヽヾ!%),.:;?]}。」、・゙゚¢

    Cannot End Line with: $([\{£¥‘“〈《「『【〔$([{「£¥

     

    There are slight variations for Traditional Chinese, Simplified Chinese, Japanese, and Korean --- which is respected by Word as well.

     

    Han-yi

     

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Tom Emerson
    Sent: Monday, May 10, 2004 7:39 PM
    To: Philippe Verdy
    Cc: Unicode Mailing List
    Subject: Re: Japanese line breaks (was: interleaved ordering)

     

    Philippe Verdy writes:

    > From: "Stefan Persson" <alsjebegrijptwatikbedoel@yahoo.se>

    > > In Japanese you can put a line break between *any* characer, except

    > > before punctuation & end quote or after start quote.

    >

    > Are you SURE of that? I had many negative comments about undesirable line breaks

    > in the middle of what is perceived as a single word, and where a single Kana

    > moved to the next line was seen as bad, notably when it is a particle.

    > I had similar comments from Korean users with Hangul.

     

    We've found an amazing amount of variation in where breaks occur on

    text live on the web... breaks show up everywhere and anywhere, to the

    point where our Japanese morphological analyzer has to ignore

    whitespace (horizontal and vertical) in many situations.(*)

     

    There is a JIS standard for line breaking, though I don't have a copy

    of it here at home right now. I can look up the "official" rules

    tomorrow if people are interested.

     

        -tree

     

    (*) The worst case we've seen was the use of katanana and hiragana in

        "ASCII" art, Picasso's Guarnica to be exact. Gave our analyzer a

        real fit for a while.

     

    --

    Tom Emerson Basis Technology Corp.

    Software Architect http://www.basistech.com

      "Beware the lollipop of mediocrity: lick it once and you suck forever"

     



    This archive was generated by hypermail 2.1.5 : Mon May 10 2004 - 22:21:06 CDT