Re: Need for Level Direction Mark

From: Philippe Verdy <>
Date: Wed, 14 Sep 2011 03:31:14 +0200

2011/9/13 Kent Karlsson <>:
> I'm not at all sure the suggested workaround works in general, and not just
> in a few examples.
> Another possibility, as long as we are just "brain-storming" a bit here, is
> to use the bidi category S (Segment Separator) for the LEVEL DIRECTION MARK
> (which would be a normally invisible (bidi) format control character). I.e.
> it would work just like TAB (as specified in the UBA), except that it
> wouldn't do tabbing. But then it would work only for the paragraph bidi
> direction. However, the idea that TAB (and the other bidi S characters)
> magically cuts through *all* nested bidi levels seems a bit strange to me...
> Going just to the closest explicit embedding/(override) level seems less
> drastic. Without formally subdividing "S", one could treat different "bidi
> S" (old and new) to reset to different levels (to the embedding bidi level
> for the new one, and to the paragraph bidi level for the three old ones). (I
> know, this would be a form of "option 1" in the PRI.)

You can turn it as you want it is still a splitting of the bidi class
if you change the behavior of class S like this. Onve again, if you
want to encode new characters, why would you restrict yourself to
reusing an existing bidi class just to break it?

Think it or simply: the stability is just meant to NOT break any bidi
rendering of existing fonts that use assigned characters. For existing
unassigned code points, there's simply never been any stability
warrantied for any property, so you can assign the properties much
more freely.

I am convinved that if you need new characters, the only good question
is which ones?

  – (1) Either you duplicate the encoding of existing whitespaces,
punctuations, symbols to give them a different bidi class (then you
can reuse one of the existing classes). But many characters would have
to be duplicated if you start this way (and WG2 will most probably
strongly oppose to this UTC proposal).

  – (2) Either you encode new bidi controls, to which you assign new
bidi classes. This does not break ANY existing text rendered with any
existing renderers. Of course you'll need an updated renderer (but not
new fonts), otherwise existing implementation will display a .notdef
glyph and the user will know visibly that there's something in the
encoded text which may be important to render the text correctly.

The second option is certainly the least disturbing (and the most
economical in terms of encoding, and the most likely to be accepted
without much troubles by voting NBs in WG2).

It does not break the policy on ANY existing encoded texts. It gives
NO surprise to users, or at least they know that something is missing,
and their decision for what to do will be exactly like when they are
presented newly encoded texts containing newly assigned characters for
which they still don't have a supporting font or any support in their
existing renderer for the complex shaping/layout features required by
a newly encoded script.

In other words, the UTC policy about the stability of Bidi classes
should be minimally relaxed, by rewording into something like:

    « The bidi class property value of any assigned code point is
IMMUTABLE (and will never change for the same assigned code point in
any subsequent versions of the UCS). »

instead of speaking about the poorly defined concept of « splitting
the bidi classes ». In fact if you add a new bidi class for new
characters, you effectively never split any existing bidi class, and
you don't break the IMMUTABILITY rule I give just here (which is
similar to the rule of immutability of other normative character
properties of assigned code points, such as the code point value, the
character name, the decomposition mapping and the combining class for
the 4 standard normalisations, and even the age version).

I can accept that the full set of possible values for the general
category is restricted and inextensible, because these categories are
frequently used in algorithms where the GC is supposed to be fully
partitioned with a constant number of elements (a fixed enumeration)
for impelmenting lots of other algorithms or derived properties. But
the Bidi class for characters is just meant for the rendering, and has
no other use than implementing the UBA itself; it should never be used
for any exclusive yes/no decision.

-- Philippe.
Received on Tue Sep 13 2011 - 20:35:46 CDT

This archive was generated by hypermail 2.2.0 : Tue Sep 13 2011 - 20:35:47 CDT