Re: UAX#14 request to reconsider Line Break property for U+3000 IDEOGRAPHIC SPACE

From: Markus Scherer <>
Date: Tue, 12 Jun 2012 22:26:34 -0700

I suggest you submit this via the Unicode reporting form.
from phone/vacation
On Jun 11, 2012 11:00 AM, "Koji Ishii" <> wrote:

> Hello Unicoders,
> I suppose this is the correct list to discuss on UAX#14, please correct me
> if I'm wrong.
> In short, as subject says, I would like to reconsider the Line Break
> property for U+3000 IDEOGRAPHIC SPACE, so that it does not allow break
> before.
> I wasn't part of original discussions for UAX#14, so I may be repeating
> discussions that were already done. I apologize if that's the case, but I
> hope I can provide some new information here.
> How to handle U+3000 IDEOGRAPHIC SPACE in line breaking is a little
> controversial in East Asia, and not all applications handle the same way
> today. Its primary reason I believe is the best method varies by
> requirements.
> After doing several investigations and discussions, I think many I talked
> to got a consensus that prohibiting break before is the best general
> answer. Exactly which class to use is a bit unclear to me, I'd appreciate
> anyone's advice but I guess it's a discussion after we agreed to make the
> change, so I'm leaving it for now.
> Here's the background of the proposal and what I discussed with people.
> Many people here might already know but almost every traditional East
> Asian word processors treated U+3000 as ID, and I'm guessing it is the
> reason why UAX#14 defines so. Many, including I, agreed that it will give
> the best editing experience for East Asian scripts.
> East Asian versions of MS Word took different approach though, primarily
> due to its re-flow architecture. It might not be well-known, and is already
> a past story, but up until Word 95, Word changed line breaks slightly when
> its printer was changed for some good reasons at that point, so its
> documents needed to look good even if line breaks were changed after the
> author has sent it to someone else.
> And problem arose, because people did not want U+3000 appearing at the
> beginning of a line as a result of such re-flow. ID give the best editing
> experience, but it does not fit well for such re-flowable documents, and
> the importance for U+3000 not appearing at the beginning of a line is
> bigger than slightly better editing experience. The same issue is happening
> today to other re-flowable documents such as HTML or EPUB.
> Ambrose told me that there's a same issue in Chinese, known as honorific
> spaces[1]. We tried to find examples of line breaking behavior for
> honorific spaces without luck, and then Kenny pointed out that authors will
> adjust text so that it will not appear at the beginning of a line and
> therefore we will not be able to find it[2].
> I also had discussions with W3C I18N WG JLTF (who authored JLREQ,)
> professional printers, and people working on EPUB in Japan for the ideal
> behavior of U+3000 around line breaks. As I wrote above, there are more
> than one best method depends on context, so discussion was a little long,
> but we tried to find the best algorithm that works for all cases. Two
> options were left; one is to mimic Word's behavior, and the other is to
> prohibit break before. The two methods give almost the same level of
> results, in some cases one is slightly superior than the other but in other
> cases the opposite, and all agreed that either option is acceptable for all
> cases we investigated. Word's behavior, however, requires slightly more
> logic, and does not support honorific space scenario well.
> Given this result, and given the honorific space situation thanks to
> Ambrose and Kenny, my conclusion is prohibiting break before is the best
> option for everyone. It may be appropriate to allow tailoring to ID where
> editing experience is more important and the document is known to never
> re-flow, but the one I proposed here is more generic.
> Allow me to end my long e-mail with a couple of notes about situation of
> browsers and the CSS WG. The actual browser implementation varies today. IE
> implements similar behavior to Word. Firefox does as I propose here; i.e.,
> prohibit break before. WebKit and Opera handles as ID. So browsers are not
> interoperable today, and I'm hoping to resolve this interoperability issue
> with CSS Text Level 3[3]. CSS Text Level 3 is going to define line breaking
> behavior for CSS, and my current thinking is to define the one I'm
> proposing here.
> I appreciate UAX#14 so much and I hope UAX#14 and CSS Text Level 3 are in
> sync, therefore I'm asking here to consider a change.
> Any opinions, thoughts, or discussions are appreciated, and your support
> for this proposal is greatly appreciated in advance.
> [1]
> [2]
> [3]
> Regards,
> Koji
Received on Wed Jun 13 2012 - 00:34:36 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 13 2012 - 00:34:45 CDT