The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Wed Oct 01, 2014 11:26 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: BiDi and word hyphenation
PostPosted: Sun Apr 03, 2011 4:57 am 
Offline

Joined: Tue Oct 12, 2010 3:35 pm
Posts: 7
Hi there!

Would hyphenation affect the BiDi algorithm?

Thanks a lot!


Top
 Profile  
 
 Post subject: Re: BiDi and word hyphenation
PostPosted: Sun Apr 03, 2011 5:21 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
Hyphenation increases the number of places that a line can be broken. The net effect is that for any given line-width, more text will fit on the average line.
The biggest benefit is where lines are justified - with hyphenated text the number of "underfull" lines can be reduced, for a more even appearance.

Normally, the bidi algorithm doesn't care for what reason a line is being broken. Wherever the line break is, it will correspond to a specific location in the logical order text (that is in memory) and all characters for that line will then be re-ordered and displayed.

The calculation of bidi levels is always carried out over the entire paragraph, so the location of the line break does not change the calculated levels.

If hyphenation is done manually, by inserting a hyphen character at a desired line break location, then this introduces a neutral character. Since hyphenation like that would be in mid-word, the left and right context of that neutral character should be the same, hence the character should behave like its surroundings. There should be no reordering of text around it.

The same should be true for using the SHY (soft hyphen) character.

However, an implementation can no longer use this strategy:
  • Determine a possible hyphenation location,
  • output the line to that point,
  • separately output a hyphen at the end of the physical line.
The hyphen to be displayed must be part of the logical storage to which bidi-reordering is applied. The reason for that is that a hyphenated LTR word at the end of an RTL line would have it's hyphen not at the end of the physical line, but at the end of the hyphenated word-fragment, wherever reordering ended up placing it.

The problematic rule is X9 (remove all BN codes).

The reason for the problem is that SHY the soft hyphen has the BN property. Most SHY codes are not needed, but the one that did define a successful line break does define the location of the hyphen to be displayed in its stead.

A layout algorithm can work around this one in several ways, for example it is possible to retain all (or some) BN codes while ignoring them for the purpose of applying the other rules (see the note about that in UAX#9).

Then, the SHYs could be removed, resp. substituted with a HYPHEN, at the step where the line fitting / line breaking actually takes place (before L1).

Another method would be to track where the location of the logical line is in the physical line, and, where necessary, insert a hyphen glyph right before display.

I haven't had to implement that anywhere yet, so I can't suggest which strategy works best, but the bidi sample in the Unibooktool and the bidi sample code in C++ show how to retain BN codes yet still reach conformant results.

Final note: read the sections on hyphenation and SHY in UAX#14 Line Breakingfor more details, because adding a hyphen is not always enough - the general case is more open ended.


Top
 Profile  
 
 Post subject: Re: BiDi and word hyphenation
PostPosted: Mon Apr 04, 2011 1:14 am 
Offline

Joined: Sun Aug 22, 2010 5:14 am
Posts: 5
1) Hyphenation should never be used to split Arabic words.

2) Hyphenation may be used to split Hebrew words.

3) The hyphen must be displayed immediately following the first part of the
split word. "Following" must be interpreted according to the direction of
the split word, i.e. on the right side for an English word, on the left side
for a Hebrew word.

4) If the split word is not at paragraph level, like a Hebrew word in a
level-0 paragraph, the hyphen will be displayed between the end of the last
Hebrew run in the line and the end of the preceding LTR run. This will look
strange since we are used to see hyphenation occur at the end of lines for
unidirectional text.
Example (paragraph level = 0):
logical string: this is an english sentence WITH SOME HEBREW
Assuming that the hyphenation point is in the middle of HEBREW, the first
line must be displayed as
display string: this is an english sentence -BEH EMOS HTIW

5) If the split word is at a level higher than the paragraph level + 1, the
results may get stranger and stranger.
Example (paragraph level = 1):
logical string: HE SAID: "this is an english sentence WITH SOME HEBREW"
display string: this is an english sentence -BEH EMOS HTIW" :DIAS EH


All this is regular application of the bidi algorithm. The only caveat is to
make sure that the hyphen stays close to the end of the split word. If the application processes each line separately, this can be achieved by adding an LRM (for an English word) or an RLM (for a Hebrew word) right after the hyphen.


Top
 Profile  
 
 Post subject: Re: BiDi and word hyphenation
PostPosted: Tue Apr 05, 2011 5:48 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
matial wrote:
All this is regular application of the bidi algorithm. The only caveat is to make sure that the hyphen stays close to the end of the split word. If the application processes each line separately, this can be achieved by adding an LRM (for an English word) or an RLM (for a Hebrew word) right after the hyphen.


The one thing that doesn't work is to rely on inserted SHY codes for hyphenation locations, then apply any old implementation of the bidi algorithm and finally hope to be able to use this SHY to break the line before rule L1 is invoked. As written, the bidi algorithm instructs the implementer to remove all characters of class BN. Unfortunately, BN is the class for SHY.

See my earlier post for possible workarounds.


Top
 Profile  
 
 Post subject: Re: BiDi and word hyphenation
PostPosted: Wed Apr 06, 2011 2:19 pm 
Offline

Joined: Tue Oct 12, 2010 3:35 pm
Posts: 7
Thank you very much! You left me speechless for a moment! I never hoped for so very detailed answers!


Top
 Profile  
 
 Post subject: Re: BiDi and word hyphenation
PostPosted: Thu Apr 07, 2011 2:25 pm 
Offline

Joined: Sat Dec 04, 2010 10:25 pm
Posts: 4
matial wrote:
Quote:
1) Hyphenation should never be used to split Arabic words.

I assume you are speaking of Arabic language only, not of all languages (and there are many) that write with Arabic script.

For example, quoting Infrastructure for High-Quality Arabic Typesetting:

Quote:
It has been said over and over again that Arabic is not hyphenated. This is true when we refer to Arabic language, but false when we refer to Arabic script. Indeed, there is one language written in Arabic script, namely Uighur, which uses hyphenation just like any European language. Uighur may use the Arabic script but is not a Semitic language and hence does not use implicit short vowels: all vowels are explicitly written and one can easily identify syllables and hyphenate
words between them.


Bob


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com