Writing Direction and BIDI Ordering FAQ
Q: My browser doesn't do the BIDI
algorithm. What's wrong?
A: Individual writing systems make different default
assumptions about how characters are arranged into lines and lines are
arranged on a page or screen. Such assumptions are referred to as a
script's directionality. For example, in the Latin script, characters
run horizontally from left to right to form lines, and lines run from
top to bottom.
Semitic scripts arrange characters right-to-left into
lines, although digits run the other way, making the scripts inherently
bidirectional. Ordering characters into lines can be even more complex
when left-to-right and right-to-left scripts are used together. Because
bidirectional scripts can have opposite directions on the same line, and
because the direction of punctuation characters is determined by their
surroundings, resolving the actual direction of a specific part of the
line depends on context analysis. The Unicode Standard defines an
implicit algorithm to determine the layout of a line, and also provides
overrides to handle situations that are ambiguous; see
UAX #9 for more information.
[JJ]
Q: Which scripts are written vertically?
A: East Asian scripts are frequently written in vertical
lines which run from top-to-bottom and are arrange columns either from
left-to-right (Mongolian) or right-to-left (other scripts). Most
characters use the same shape and orientation when displayed
horizontally or vertically, but many punctuation characters will change
their shape when displayed vertically.
Letters and words from other scripts are generally rotated
through ninety degree angles so that they, too, will read from top to
bottom. That is, letters from left-to-right scripts will be rotated
clockwise and letters from right-to-left scripts counterclockwise, both
through ninety degree angles.
Unlike the bidirectional case, the choice of vertical
layout is usually treated as a formatting style; therefore, the Unicode
Standard does not define default rendering behavior for vertical text
nor provide directionality controls designed to override such behavior.
[JJ]
Q: Are there any other script directions?
A: Other script directionalities are possible and are found
in actual writing systems, mainly in historical ones. For example, some
ancient Numidian texts are written bottom-to-top, and Egyptian
hieroglyphics can be written with arbitrary directions for individual
lines.
One prominent example is boustrophedon (literally,
"ox-turning"), which is often found in ancient European writing systems
such as early Greek. In boustrophedon writing, characters are arranged
into horizontal lines, but the individual lines alternate between
running right to left and running left to right, the way an ox goes back
and forth when plowing a field. The letters themselves use mirrored
images in accordance with each individual line's direction.
[JJ]
Q: So do developers need to worry about
these historical directions?
A: Not really. Boustrophedon writing is of interest almost
exclusively to scholars intent on reproducing the exact visual content
of ancient texts. The Unicode Standard does not provide formatting codes
to signal boustrophedon text. Specialized word processors for ancient
scripts might offer support for this. In the absence of that, fixed
texts can be written in boustrophedon by using hard line breaks and
directionality overrides. [JJ]
Q: On What Is Unicode? on the upper left,
there is some Arabic text. It seems to display incorrectly on my
browser!
A: The example on the upper left of the
What Is Unicode? page is some Arabic
text followed by some English text. The whole thing looks like the
example #1 below, which is shown here in logical order, with uppercase
standing for Arabic, lowercase for English. As you may understand the bidi
algorithm, you might think this should be rendered as #2. For example,
it might be what your application does, and an Arab speaker may have confirmed to
you that this is correct. Your
browser, however, displays this as #3. Many people's understanding of both the bidi
algorithm and of how to read bidi text says that the example is a predominantly RTL paragraph (as
perceived by the bidi speakers) , and thus should be read ARABIC
first, then English.
- "WHAT IS "UNICODE"? in arabic
-
in arabic ?"EDOCINU" SI TAHW
-
?"EDOCINU" SI TAHW in arabic
The rendering you get depends on the paragraph
direction. Determining the paragraph direction on the basis of the first
strong character is used in the absence of a more explicit higher level
protocol, such as a UI that permits you to explicitly set that
direction. See "Higher-Level Protocols" in
UAX #9. Depending on that setting, you
will get either RTL: #2, or LTR: #3. So your browser is not displaying
incorrectly. Note that HTML has explicit protocols that permit the
setting of the paragraph direction.
Q: Does the Bidi Algorithm depend on giving
default values for the Bidi_Class property to unassigned code points?
A: Yes. See the FAQ on
Character Properties, Case Mappings & Names for an explanation.
Q: Are there any issues with normalizing
Arabic and/or Hebrew?
A: Yes, see the question "Isn't the
canonical order for Arabic characters wrong?" and following.
Q: The characters U+0CBF KANNADA VOWEL SIGN I and
U+0CC6 KANNADA VOWEL SIGN E
seem to have inconsistent character properties. They have General Category
Mn and Bidi Class L. However, UAX #9 says that all Me and Mn category
characters are Bidi Class NSM. Is this right?
A: This was an explicit decision by UTC for these characters, to
preserve canonical equivalence under Bidi for two vowels involving these as
parts of decompositions.