Re: Proposed Draft UTR #31 - Syntax Characters

From: Peter Kirk (
Date: Thu Aug 21 2003 - 12:43:06 EDT

  • Next message: Peter Kirk: "Re: Character codes for Egyptian transliteration"

    On 21/08/2003 08:38, Mark Davis wrote:

    >I suspect your distinction is a bit too subtle to be useful. Having, for
    >example, a RLM only have affect when adjacent to a space in a regular expression
    >would be pretty prone to error; expecially since the character would be
    >The reason for allowing LRM and RLM is to be able to make patterns readable. If
    >you have some syntax like
    >(where the uppercase represents Hebrew), then bidi display of the neutrals
    >renders the pattern almost completely illegible. Inserting LRMs or RLMs at
    >appropriate points straightens out the display. In a special "pattern UI", one
    >could override the (or some) neutrals to have a strong direction, but most
    >patterns are viewed and edited in plaintext editors.
    >My recommendation for pattern syntax would be to quote all
    >Default_Ignorable_Code_Points if they are actually to be part of literals.
    >Otherwise the maintanence of such regular expressions (or queries, or rules,
    >etc.) becomes quite difficult, since the DICP are invisible by default.
    >► “Eppur si muove” ◄
    Understood, I think. I agree that literal default ignorables should
    be quoted. My concern was that in an example like yours, if RLM and LRM
    alone are taken as whitespace, they might be taken as terminating the
    whole pattern, which would defeat your purpose of allowing them to be
    inserted in the pattern so that it displays as required. That was why
    I wanted them to be ignored in the patterns. But maybe I am not
    understanding enough of the context of the whole syntax here.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Thu Aug 21 2003 - 13:43:08 EDT