[Unicode]  Frequently Asked Questions Home | Site Map | Search

Writing Direction and BIDI Ordering FAQ

Q: My browser doesn't do the BIDI algorithm. What's wrong?

A: Individual writing systems make different default assumptions about how characters are arranged into lines and lines are arranged on a page or screen. Such assumptions are referred to as a script's directionality. For example, in the Latin script, characters run horizontally from left to right to form lines, and lines run from top to bottom.

Semitic scripts arrange characters right-to-left into lines, although digits run the other way, making the scripts inherently bidirectional. Ordering characters into lines can be even more complex when left-to-right and right-to-left scripts are used together. Because bidirectional scripts can have opposite directions on the same line, and because the direction of punctuation characters is determined by their surroundings, resolving the actual direction of a specific part of the line depends on context analysis. The Unicode Standard defines an implicit algorithm to determine the layout of a line, and also provides overrides to handle situations that are ambiguous; see UAX #9 for more information. [JJ]

Q: Which scripts are written vertically?

A: East Asian scripts are frequently written in vertical lines which run from top-to-bottom and are arrange columns either from left-to-right (Mongolian) or right-to-left (other scripts). Most characters use the same shape and orientation when displayed horizontally or vertically, but many punctuation characters will change their shape when displayed vertically.

Letters and words from other scripts are generally rotated through ninety degree angles so that they, too, will read from top to bottom. That is, letters from left-to-right scripts will be rotated clockwise and letters from right-to-left scripts counterclockwise, both through ninety degree angles.

Unlike the bidirectional case, the choice of vertical layout is usually treated as a formatting style; therefore, the Unicode Standard does not define default rendering behavior for vertical text nor provide directionality controls designed to override such behavior. [JJ]

Q: Are there any other script directions?

A: Other script directionalities are possible and are found in actual writing systems, mainly in historical ones. For example, some ancient Numidian texts are written bottom-to-top, and Egyptian hieroglyphics can be written with arbitrary directions for individual lines.

One prominent example is boustrophedon (literally, "ox-turning"), which is often found in ancient European writing systems such as early Greek. In boustrophedon writing, characters are arranged into horizontal lines, but the individual lines alternate between running right to left and running left to right, the way an ox goes back and forth when plowing a field. The letters themselves use mirrored images in accordance with each individual line's direction. [JJ]

Q: So do developers need to worry about these historical directions?

A: Not really. Boustrophedon writing is of interest almost exclusively to scholars intent on reproducing the exact visual content of ancient texts. The Unicode Standard does not provide formatting codes to signal boustrophedon text. Specialized word processors for ancient scripts might offer support for this. In the absence of that, fixed texts can be written in boustrophedon by using hard line breaks and directionality overrides. [JJ]

Q: On What Is Unicode? on the upper left, there is some Arabic text. It seems to display incorrectly on my browser!

A: The example on the upper left of the What Is Unicode? page is some Arabic text followed by some English text. The whole thing  looks like the example #1 below, which is shown here in logical order, with uppercase standing for Arabic, lowercase for English. As you may understand the bidi algorithm, you might think this should be rendered as #2. For example, it might be what your application does, and an Arab speaker may have confirmed to you that this is correct. Your browser, however, displays this as #3. Many people's understanding of both the bidi algorithm and of how to read bidi text says that the example is a predominantly RTL paragraph (as perceived by the bidi speakers) , and thus should be read ARABIC first, then English.

  1. "WHAT IS "UNICODE"? in arabic
  2. in arabic ?"EDOCINU" SI TAHW
  3. ?"EDOCINU" SI TAHW in arabic

The rendering you get depends on the paragraph direction. Determining the paragraph direction on the basis of the first strong character is used in the absence of a more explicit higher level protocol, such as a UI that permits you to explicitly set that direction. See "Higher-Level Protocols" in UAX #9. Depending on that setting, you will get either RTL: #2, or LTR: #3. So your browser is not displaying incorrectly. Note that HTML has explicit protocols that permit the setting of the paragraph direction.

Q: Does the Bidi Algorithm depend on giving default values for the Bidi_Class property to unassigned code points?

A: Yes. See the FAQ on Character Properties, Case Mappings & Names for an explanation.

Q: Are there any issues with normalizing Arabic and/or Hebrew?

A: Yes, see the question "Isn't the canonical order for Arabic characters wrong?" and following.

Q: The characters U+0CBF KANNADA VOWEL SIGN I and U+0CC6 KANNADA VOWEL SIGN E seem to have inconsistent character properties. They have General Category Mn and Bidi Class L. However, UAX #9 says that all Me and Mn category characters are Bidi Class NSM. Is this right?

A: This was an explicit decision by UTC for these characters, to preserve canonical equivalence under Bidi for two vowels involving these as parts of decompositions.