Accumulated Feedback on PRI #212

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Thu Jun 14 14:21:46 CDT 2012
Contact: kojiishi@gmail.com
Name: Koji Ishii
Report Type: Other Question, Problem, or Feedback
Opt Subject: UAX#14 request to reconsider Line Break property for U+3000 IDEOGRAPHIC SPACE


Hello,

In short, as subject says, I would like to reconsider the Line Break property
for U+3000 IDEOGRAPHIC SPACE, so that it does not allow break before.

I wasn't part of original discussions for UAX#14, so I may be repeating
discussions that were already done. I apologize if that's the case, but I hope
I can provide some new information here.

How to handle U+3000 IDEOGRAPHIC SPACE in line breaking is a little
controversial in East Asia, and not all applications handle the same way
today. Its primary reason I believe is the best method varies by requirements.

After doing several investigations and discussions, I think many I talked to
got a consensus that prohibiting break before is the best general answer.
Exactly which class to use is a bit unclear to me, I'd appreciate anyone's
advice but I guess it's a discussion after we agreed to make the change, so
I'm leaving it for now.

Here's the background of the proposal and what I discussed with people.

Many people here might already know but almost every traditional East Asian
word processors treated U+3000 as ID, and I'm guessing it is the reason why
UAX#14 defines so. Many, including I, agreed that it will give the best
editing experience for East Asian scripts.

East Asian versions of MS Word took different approach though, primarily due
to its re-flow architecture. It might not be well-known, and is already a past
story, but up until Word 95, Word changed line breaks slightly when its
printer was changed for some good reasons at that point, so its documents
needed to look good even if line breaks were changed after the author has sent
it to someone else.

And problem arose, because people did not want U+3000 appearing at the
beginning of a line as a result of such re-flow. ID give the best editing
experience, but it does not fit well for such re-flowable documents, and the
importance for U+3000 not appearing at the beginning of a line is bigger than
slightly better editing experience. The same issue is happening today to other
re-flowable documents such as HTML or EPUB.

Ambrose told me that there's a same issue in Chinese, known as honorific
spaces[1]. We tried to find examples of line breaking behavior for honorific
spaces without luck, and then Kenny pointed out that authors will adjust text
so that it will not appear at the beginning of a line and therefore we will
not be able to find it[2].

I also had discussions with W3C I18N WG JLTF (who authored JLREQ,)
professional printers, and people working on EPUB in Japan for the ideal
behavior of U+3000 around line breaks. As I wrote above, there are more than
one best method depends on context, so discussion was a little long, but we
tried to find the best algorithm that works for all cases. Two options were
left; one is to mimic Word's behavior, and the other is to prohibit break
before. The two methods give almost the same level of results, in some cases
one is slightly superior than the other but in other cases the opposite, and
all agreed that either option is acceptable for all cases we investigated.
Word's behavior, however, requires slightly more logic, and does not support
honorific space scenario well.

Given this result, and given the honorific space situation thanks to Ambrose
and Kenny, my conclusion is prohibiting break before is the best option for
everyone. It may be appropriate to allow tailoring to ID where editing
experience is more important and the document is known to never re-flow, but
the one I proposed here is more generic.

Allow me to end my long e-mail with a couple of notes about situation of
browsers and the CSS WG. The actual browser implementation varies today. IE
implements similar behavior to Word. Firefox does as I propose here; i.e.,
prohibit break before. WebKit and Opera handles as ID. So browsers are not
interoperable today, and I'm hoping to resolve this interoperability issue
with CSS Text Level 3[3]. CSS Text Level 3 is going to define line breaking
behavior for CSS, and my current thinking is to define the one I'm proposing
here.

I appreciate UAX#14 so much and I hope UAX#14 and CSS Text Level 3 are in
sync, therefore I'm asking here to consider a change.

Any opinions, thoughts, or discussions are appreciated, and your support for
this proposal is greatly appreciated in advance.

[1] http://lists.w3.org/Archives/Public/www-style/2012Apr/0013.html
[2] http://lists.w3.org/Archives/Public/www-style/2012May/0106.html
[3] http://dev.w3.org/csswg/css3-text/

Regards,
Koji