The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Mon Sep 01, 2014 11:26 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
 Post subject: HTML 5, <BR> and bidi vs Unicode Line Separator
PostPosted: Tue Feb 09, 2010 8:17 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
There's been a long-standing recommendation to treat the LS as equivalent to the HTML <BR> element. The Unicode line separator has a specific function in the Unicode Bidirectional algorithm, in that it ends a line but does not restart a paragraph.

In the discussion on HTML5, it was recognized that the <BR> element is often used in ways that make restarting the bidi algorithm more natural than continuing. The proposal, as formulated, would result in equating LS to <br bdi="no"> in HTML.


Top
 Profile  
 
 Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator
PostPosted: Thu Dec 02, 2010 3:26 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
There's been an update. As mentioned on the hmtl-bidi-improvemtens list:
Quote:
The Proposed Solution of 2.1 ("<br> and embedded block elements should
serve as bidi separators") should explicitly state that it implies a
change in UTR #20 <http://unicode.org/reports/tr20/#Line> and UAX #13
<http://unicode.org/reports/tr13/tr13-9.html#Background>.

In the former, 'In HTML, use <xhtml:br /> instead of U+2028' should be
replaced with 'In HTML, use <xhtml:br bdi="no" /> instead of U+2028'.

In the latter, 'line separators basically correspond to HTML <BR>'
should be replaced with 'line separators basically correspond to HTML
<BR BDI="no">'.


Since this was actually fixed (bug #10828), the current text in the Unicode standard is broken.


As is the text in UTR#20.

This is just a heads-up. A formal report will likely be submitted via the reporting form.


Top
 Profile  
 
 Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator
PostPosted: Mon Dec 06, 2010 3:23 am 
Offline

Joined: Mon Dec 06, 2010 3:01 am
Posts: 1
Update:

HTML5 has been changed to specify <BR> as a bidi paragraph break.

Attempts to introduce an attribute that would allow control over that have been rejected. A large part of the reason is that we have been unable to find significant use cases where a WS line break actually fixed a bidi ordering problem.The usual suspects for such use cases - poetry and addresses - do not prove to be very significant, since the difference between B and WS is only visible when the text contains a mixture of LTR and RTL. This is not something one encounters in poetry. For mixed-script addresses, it is not at all certain that WS gives better results.

On the bright side, LS (U+2028) and PS (U+2029) are no longer mentioned by the HTML5 spec at all, which means their treatment is up to CSS (which is the layer that controls whitespace folding). Since CSS does not mention these characters at all either, presumably they should start being treated as specified in the Unicode standard, i.e. as line breaks. The CSS test suite now has tests that they are treated accordingly in pre-formatted text, but that should actually be the case in normal text as well.


Top
 Profile  
 
 Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator
PostPosted: Mon Dec 06, 2010 5:06 am 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
I understand the rationale for cementing the actual behavior of <br> rather than an idealized one that would be inconsistent with what people have actually been doing.

However, what Unicode and W3C need to settle is the proper recommendation for treating U+2028 (Line Separator) and U+2029 (Paragraph Separator) when translating between HTML and plain text.

PS can be mapped to <p> and arguably should be (except inside <PRE> elements). I say, should be, because it's not good to have two different methods to express the same semantics.

With the redefinition of <BR> ow cannot be mapped to anything without creating a difference in the behavior of plain text vs. HTML.

If the idea is that HTML UAs will recognize it as LS and treat it accordingly, then UTR#20 would need to be updated to state that it should be retained on conversion to HTML.

Would that be the correct conclusion?


Top
 Profile  
 
 Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator
PostPosted: Wed Dec 08, 2010 2:13 am 
Offline

Joined: Wed Dec 08, 2010 1:49 am
Posts: 3
Location: Japan
If the reason(s) for rejecting an LS-like version of <br> in HTML5 are correct, then that would essentially mean that we could keep the text in UTR20 as is, because there are no cases known where <br> wouldn't do the job.

If we want to be a bit more careful, then we can say:

In HTML, use <xhtml:br /> instead of U+2028, unless you deal with data such as poems and addresses that have mixed-directionality data on the same line. In the later case (which is claimed to not exist), leave U+2028 as is.

[The wording sure can be improved, but that's the gist of it as far as I understand.]


As for the text from UTR13, which now must be somewhere in the standard itself, we cannot just change it from "line separators basically correspond to HTML <BR>" to something like "line separators basically correspond to HTML <BR BDI='no'>". The origin of this text is that the UTC, or the Editorial Committee, or whoever, took a shortcut for defining line separators by just refering to <br>, because it was possible to assume that <br> was widely known. Now that it turns out that the story for <br> is more complicated than that, such a shortcut doesn't work anymore, and whoever is responsible should work on an actual free-standing definition of the functionality of Line Separator.


Top
 Profile  
 
 Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator
PostPosted: Wed Dec 08, 2010 6:02 am 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
OK, that sounds like it's going in the right direction - although I would phrase it differently

...unless you deal with data where restarting the bidi algorithm after a <br> would lead to different paragraph direction than the preceding paragraph. In the latter case (which is claimed to be so rare as to not exist), leave U+2028 as is.


If you phrase this without reference to poems and addresses, I think it becomes much clearer what the incompatibility is.

The use of LS is not at all restricted to poems and addresses, so you don't really know where it might have been used.


Top
 Profile  
 
 Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator
PostPosted: Wed Dec 08, 2010 7:29 am 
Offline

Joined: Wed Dec 08, 2010 7:21 am
Posts: 2
I have another problem with this suggestion.

From correspondence it the HTML bugzilla, I realized that the reasons for rejecting our proposed LS-like <br> are not related to this specific behavior, but in fact it seems that the whole functionality of <br> is display-related and hence actually out of scope for "pure" HTML (in theory, it should be removed, and handled purely in CSS, like <font> and other deprecated stuff).

I think the only reason <br> is kept is because of the current wide usage (I am certain that the HTML-spec editors would find this current (PS - like) usage just as "abusive" as the usecases that we gave for the LS-like usage). This means that while <br> is still part of the standard, it is probably not recommended to use it at all.

So, to avoid a situation where the Unicode standard recommends html code that would be frowed upon by HTML-spec editors as "abusive", I suggest that we avoid mentioning <br> altogether (make the definition free-standing, as MartinJD suggested above).


Top
 Profile  
 
 Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator
PostPosted: Wed Dec 15, 2010 3:50 am 
Offline

Joined: Wed Dec 08, 2010 1:49 am
Posts: 3
Location: Japan
Simon Montagu reports at http://groups.google.com/group/html-bid ... bdd8b05110 that IE8/9 in standards mode actually treat <br> as a line break, not a pragraph break. This means that much of the discussion above may have to be rethought.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com