Hyphenation increases the number of places that a line can be broken. The net effect is that for any given line-width, more text will fit on the average line.
The biggest benefit is where lines are justified - with hyphenated text the number of "underfull" lines can be reduced, for a more even appearance.
Normally, the bidi algorithm doesn't care for what reason a line is being broken. Wherever the line break is, it will correspond to a specific location in the logical order text (that is in memory) and all characters for that line will then be re-ordered and displayed.
The calculation of bidi levels is always carried out over the entire paragraph, so the location of the line break does not change the calculated levels.
If hyphenation is done manually, by inserting a hyphen character at a desired line break location, then this introduces a neutral character. Since hyphenation like that would be in mid-word, the left and right context of that neutral character should be the same, hence the character should behave like its surroundings. There should be no reordering of text around it.
The same should be true for using the SHY (soft hyphen) character.
However, an implementation can no longer use this strategy:
- Determine a possible hyphenation location,
- output the line to that point,
- separately output a hyphen at the end of the physical line.
The hyphen to be displayed must be part of the logical storage to which bidi-reordering is applied. The reason for that is that a hyphenated LTR word at the end of an RTL line would have it's hyphen not at the end of the physical line, but at the end of the hyphenated word-fragment, wherever reordering ended up placing it.
The problematic rule is X9 (remove all BN codes).
The reason for the problem is that SHY the soft hyphen has the BN property. Most SHY codes are not needed, but the one that did define a successful line break does define the location of the hyphen to be displayed in its stead.
A layout algorithm can work around this one in several ways, for example it is possible to retain all (or some) BN codes while ignoring them for the purpose of applying the other rules (see the note about that in UAX#9).
Then, the SHYs could be removed, resp. substituted with a HYPHEN, at the step where the line fitting / line breaking actually takes place (before L1).
Another method would be to track where the location of the logical line is in the physical line, and, where necessary, insert a hyphen glyph right before display.
I haven't had to implement that anywhere yet, so I can't suggest which strategy works best, but the bidi sample in the
Unibooktool and the bidi sample code in C++ show how to retain BN codes yet still reach conformant results.
Final note: read the sections on hyphenation and SHY in
UAX#14 Line Breakingfor more details, because adding a hyphen is not always enough - the general case is more open ended.