From: Peter Constable (
Date: Thu Dec 06 2007 - 10:51:09 CST

  • Next message: Andreas Prilop: "Rot13 and letters with accents"

    > From: [] On
    > Behalf Of Karl Pentzlin

    Reply in opposite order:

    > b.) Why U+FD3E and U+FD3F have the Bidi_mirroring property not set?

    IIRC, this is by design for back-compat reasons. I believe it has been discussed on this list before.

    > This leads to my questions:
    > a.) Why U+FD3E has GC property Ps and U+FD3F has Pe, and not vice
    > versa?

    Good question. Primary usage with Arabic seems to suggest vice versa. Mind, since in principle they can be used in either direction, something neutral such as Po might make sense. A key question to consider is what derived properties and algorithms would be affected by a change. For instance, switching Ps/Pe values for these characters would have a follow-on effect for line breaking:

    FD3E gc=Ps, lb=OP
    FD3F gc=Pe, lb=CL

    If changed:
    FD3E gc=Pe, lb=CL
    FD3F gc=Ps, lb=OP

    That would result in a significant change in line-breaking behaviour, though it would probably be an improvement for use in Arabic text (and detrimental for use in LTR text). But changing to a neutral category such as Po would have far more substantial impact on line breaking since both would have lb=AL; in particular, neither would behave particularly like closing punctuation.

    There are no contingent line-breaking properties -- break this way for RTL but that way for LTR. So, there's no way to assign properties to these characters that provide the desired behaviour in all scenarios. Since -- at least, for line breaking -- a tailoring is needed to do the right thing in all cases, perhaps there's not a lot of value in changing the properties.


    This archive was generated by hypermail 2.1.5 : Thu Dec 06 2007 - 10:54:39 CST