|
|
Page 1 of 1
|
[ 8 posts ] |
|
| Author |
Message |
|
asmus
|
Post subject: HTML 5, <BR> and bidi vs Unicode Line Separator Posted: Tue Feb 09, 2010 8:17 pm |
|
 |
| Unicode Guru |
Joined: Tue Dec 01, 2009 2:49 pm Posts: 172
|
|
There's been a long-standing recommendation to treat the LS as equivalent to the HTML <BR> element. The Unicode line separator has a specific function in the Unicode Bidirectional algorithm, in that it ends a line but does not restart a paragraph.
In the discussion on HTML5, it was recognized that the <BR> element is often used in ways that make restarting the bidi algorithm more natural than continuing. The proposal, as formulated, would result in equating LS to <br bdi="no"> in HTML.
|
|
| Top |
|
 |
|
asmus
|
Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator Posted: Thu Dec 02, 2010 3:26 pm |
|
 |
| Unicode Guru |
Joined: Tue Dec 01, 2009 2:49 pm Posts: 172
|
There's been an update. As mentioned on the hmtl-bidi-improvemtens list: Quote: The Proposed Solution of 2.1 ("<br> and embedded block elements should serve as bidi separators") should explicitly state that it implies a change in UTR #20 <http://unicode.org/reports/tr20/#Line> and UAX #13 <http://unicode.org/reports/tr13/tr13-9.html#Background>.
In the former, 'In HTML, use <xhtml:br /> instead of U+2028' should be replaced with 'In HTML, use <xhtml:br bdi="no" /> instead of U+2028'.
In the latter, 'line separators basically correspond to HTML <BR>' should be replaced with 'line separators basically correspond to HTML <BR BDI="no">'.
Since this was actually fixed (bug #10828), the current text in the Unicode standard is broken.
As is the text in UTR#20. This is just a heads-up. A formal report will likely be submitted via the reporting form.
|
|
| Top |
|
 |
|
aharon
|
Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator Posted: Mon Dec 06, 2010 3:23 am |
|
Joined: Mon Dec 06, 2010 3:01 am Posts: 1
|
|
Update:
HTML5 has been changed to specify <BR> as a bidi paragraph break.
Attempts to introduce an attribute that would allow control over that have been rejected. A large part of the reason is that we have been unable to find significant use cases where a WS line break actually fixed a bidi ordering problem.The usual suspects for such use cases - poetry and addresses - do not prove to be very significant, since the difference between B and WS is only visible when the text contains a mixture of LTR and RTL. This is not something one encounters in poetry. For mixed-script addresses, it is not at all certain that WS gives better results.
On the bright side, LS (U+2028) and PS (U+2029) are no longer mentioned by the HTML5 spec at all, which means their treatment is up to CSS (which is the layer that controls whitespace folding). Since CSS does not mention these characters at all either, presumably they should start being treated as specified in the Unicode standard, i.e. as line breaks. The CSS test suite now has tests that they are treated accordingly in pre-formatted text, but that should actually be the case in normal text as well.
|
|
| Top |
|
 |
|
asmus
|
Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator Posted: Mon Dec 06, 2010 5:06 am |
|
 |
| Unicode Guru |
Joined: Tue Dec 01, 2009 2:49 pm Posts: 172
|
|
I understand the rationale for cementing the actual behavior of <br> rather than an idealized one that would be inconsistent with what people have actually been doing.
However, what Unicode and W3C need to settle is the proper recommendation for treating U+2028 (Line Separator) and U+2029 (Paragraph Separator) when translating between HTML and plain text.
PS can be mapped to <p> and arguably should be (except inside <PRE> elements). I say, should be, because it's not good to have two different methods to express the same semantics.
With the redefinition of <BR> ow cannot be mapped to anything without creating a difference in the behavior of plain text vs. HTML.
If the idea is that HTML UAs will recognize it as LS and treat it accordingly, then UTR#20 would need to be updated to state that it should be retained on conversion to HTML.
Would that be the correct conclusion?
|
|
| Top |
|
 |
|
MartinJD
|
Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator Posted: Wed Dec 08, 2010 2:13 am |
|
Joined: Wed Dec 08, 2010 1:49 am Posts: 3 Location: Japan
|
|
If the reason(s) for rejecting an LS-like version of <br> in HTML5 are correct, then that would essentially mean that we could keep the text in UTR20 as is, because there are no cases known where <br> wouldn't do the job.
If we want to be a bit more careful, then we can say:
In HTML, use <xhtml:br /> instead of U+2028, unless you deal with data such as poems and addresses that have mixed-directionality data on the same line. In the later case (which is claimed to not exist), leave U+2028 as is.
[The wording sure can be improved, but that's the gist of it as far as I understand.]
As for the text from UTR13, which now must be somewhere in the standard itself, we cannot just change it from "line separators basically correspond to HTML <BR>" to something like "line separators basically correspond to HTML <BR BDI='no'>". The origin of this text is that the UTC, or the Editorial Committee, or whoever, took a shortcut for defining line separators by just refering to <br>, because it was possible to assume that <br> was widely known. Now that it turns out that the story for <br> is more complicated than that, such a shortcut doesn't work anymore, and whoever is responsible should work on an actual free-standing definition of the functionality of Line Separator.
|
|
| Top |
|
 |
|
asmus
|
Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator Posted: Wed Dec 08, 2010 6:02 am |
|
 |
| Unicode Guru |
Joined: Tue Dec 01, 2009 2:49 pm Posts: 172
|
|
OK, that sounds like it's going in the right direction - although I would phrase it differently
...unless you deal with data where restarting the bidi algorithm after a <br> would lead to different paragraph direction than the preceding paragraph. In the latter case (which is claimed to be so rare as to not exist), leave U+2028 as is.
If you phrase this without reference to poems and addresses, I think it becomes much clearer what the incompatibility is.
The use of LS is not at all restricted to poems and addresses, so you don't really know where it might have been used.
|
|
| Top |
|
 |
|
amitar
|
Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator Posted: Wed Dec 08, 2010 7:29 am |
|
Joined: Wed Dec 08, 2010 7:21 am Posts: 2
|
|
I have another problem with this suggestion.
From correspondence it the HTML bugzilla, I realized that the reasons for rejecting our proposed LS-like <br> are not related to this specific behavior, but in fact it seems that the whole functionality of <br> is display-related and hence actually out of scope for "pure" HTML (in theory, it should be removed, and handled purely in CSS, like <font> and other deprecated stuff).
I think the only reason <br> is kept is because of the current wide usage (I am certain that the HTML-spec editors would find this current (PS - like) usage just as "abusive" as the usecases that we gave for the LS-like usage). This means that while <br> is still part of the standard, it is probably not recommended to use it at all.
So, to avoid a situation where the Unicode standard recommends html code that would be frowed upon by HTML-spec editors as "abusive", I suggest that we avoid mentioning <br> altogether (make the definition free-standing, as MartinJD suggested above).
|
|
| Top |
|
 |
|
MartinJD
|
Post subject: Re: HTML 5, <BR> and bidi vs Unicode Line Separator Posted: Wed Dec 15, 2010 3:50 am |
|
Joined: Wed Dec 08, 2010 1:49 am Posts: 3 Location: Japan
|
|
| Top |
|
 |
|
Page 1 of 1
|
[ 8 posts ] |
|
Who is online |
Users browsing this forum: No registered users and 1 guest |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|
|