Re: bidi support for xterm

From: Jonathan Rosenne (rosenne@qsm.co.il)
Date: Sun Aug 15 1999 - 09:57:43 EDT


ISO 6429:1992 "Information Technology - Control Functions for Coded Character Sets", also available as ECMA Standard ECMA-48 (1991), describes the meaning of the control functions in a bidi context. It is "pre-Unicode", but the ideas are still valid.

The VT100 did not have "a" definition, there were quite a number of them, and for the 8 bit VT's the Hebrew and Arabic implementations were very different. ECMA-48 / ISO 6429 is a compromise that was acceptable to experts in both languages.

I hope this helps.

Jony

At 00:28 15/08/1999 -0700, Edward Cherlin wrote:
>At 03:48 -0700 8/14/1999, Markus Kuhn wrote:
>[snip]
>
>>However, mere implementations of the Unicode bidi algorithm are far from
>>what we need to really understand how to handle bidi text in xterm or
>>other VT100/ISO 6429 emulators.
>
>The basic Bidi algorithm in the monospaced context takes a sequence of
>Unicode characters as input and gives back something like a rectangular
>array (matching the size of the display) of glyphs, or an equivalent list
>of lines in visual order. This is easy to render.
>
>The Bidi functions we look for in an editor should do differential display
>update, so that only the changed character positions are affected on the
>screen. Insert, delete, and replace are fairly easy within a directional
>run, as long as they don't spill over to the next line. It takes a bit more
>work when the cursor crosses a direction boundary, or when a word is pushed
>to the next line or pulled back up as text lengthens and shortens. In the
>general case, where multiple boundaries are crossed in a single command, it
>may be easier to render the paragraph again from scratch.
>
>>Xterm, like any VT100 emulator, is NOT
>>just a receiver of a stream of Unicode plaintext. It is a rendering
>>engine that places glyphs onto a character cell matrix, and the received
>>stream of Unicode characters is mixed with a huge number of different
>>control sequences for positioning the cursor, scrolling parts of the
>>screen, deleting parts of the screen, etc., whose semantics in the
>>context of the Unicode bidi algorithm are extremely unclear (at least to
>>me!).
>
>Me, too, since they have no definitions other than their behavior on
>screen. Are editor functions that closely identified with terminal
>controls? Then we must choose whether to keep that mapping, or whether to
>implement editor functions that deal with Bidi as users of Bidi expect. Or
>implement a set of each, and let users choose.
>
>>We have to worry about full-screen editors such as vi or mined
>>which interact with xterm in a very intimate way in order to provide
>>with the user an intuitive editing functionality. If I tell xterm to
>>position the cursor into some hebrew text and then send the
>>delete-end-of-line ESC sequence, is xterm supposed to delete to the left
>>or to the right?
>
>Sorry, this question turns out not to fit the context.
>
>Using the obvious codes for LTR, RTL, and Insertion point,
>delete-end-of-line removes the characters in the marked positions from the
>following line, shown in visual order.
>
>LLLLLLLLLLLLRRRRIRRLLLLL
>            4321   56789
>
>This is simply the forward direction in the text--leftward to the end of
>the leftward run, then rightward in the enclosing rightward run. You must
>learn to think of forward and backward, not leftward and rightward. Then
>convincing the software to think that way shouldn't be too hard. :-)
>
>>What should the backspace control code do on the screen
>>when it passes through mixed hebrew/latin text?
>
>Well, this is where the convincing comes in. Some part of the software has
>to interpret the Unicode semantics, and translate  the results of editor
>commands to sequences of terminal commands that create the right display.
>Unless you want editor commands that act like terminal commands on the
>display, and make the software figure out what sequence of Unicode
>characters would produce that arrangement of glyphs.
>
>>I think for xterm the higher priority projects should be biwidth fonts
>>(for CJK) and combining characters (for Thai, phonetic alphabet, etc.),
>>which seems to be of manageable complexity. I have no idea, how a
>>practical convention for the interaction of full-screen editors with
>>xterm whould look like, if xterm tried somehow to implement the Unicode
>>bidi algorithm, and I challenge anyone who urgently wants to have the
>>bidi algorithm in xterm to write up a detailed proposal that explains
>>how this should work precisely.
>
>It isn't hard to state the principles. I can't give you a detailed proposal
>since I don't know what editor and terminal command sets you want to
>harmonize, nor whether you want to keep terminal function semantics or to
>follow the logic of Unicode in extending editor functions.  I would be
>happy to see a statement of principles and the corresponding set of
>functions that need to be defined, and I would assist any effort to design
>such software.
>
>[snip]
>
>>Markus
>>
>>--
>>Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
>>Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
>
>
>--
>Edward Cherlin   edward.cherlin.sy.67@aya.yale.edu
>"It isn't what you don't know that hurts you, it's
>what you know that ain't so."--Mark Twain, or else
>some other prominent 19th century humorist and wit
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT