Re: Saudi-Arabian Copyright sign

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Sep 22 2004 - 18:50:46 CDT

  • Next message: D. Starner: "Re: Saudi-Arabian Copyright sign"

    Jonathan Coxhead asked:

    > >>Then could/should we use the sequence <200C, 062D, 20DD, 200C>?
    > >
    > >
    > > You *could* use that sequence, and if your rendering implementation
    > > were sophisticated enough, it *might* render what you were
    > > expecting.
    >
    > So here's my question ...
    >
    > If I did write the sequence <200C, 062D, 20DD, 200C>, would
    > *should* I expect?
    >
    > It seems to me that---barring bugs---this ought to produce the symbol
    > expected, in a completely standard-conforming way, and with no extra encoding
    > needed.

    Well, *assuming* that you are dealing with a Unicode 4.1 (or subsequent)
    implementation of Arabic and bidi that has been updated to the
    Unicode 4.1 data files (so that U+20DD is jt=Transparent), and
    *assuming* you have access to a font that can actually represent
    a circle around U+062D legibly, yes, you should expect to see a
    circled HAH.

    Note that <200C, 062D, 200C, 20DD> should *also* produce the same
    visual rendering, but is not canonically equivalent to the first
    sequence. So that is *another* fly in the ointment.

    >
    > If I write <200C, 062D, 20DD, 200C>, and I don't see this Saudi
    > copyright
    > sign, shouldn't I be able to complain to someone for non-compliance?

    No, not if the renderer you are using doesn't claim to "interpret"
    U+20DD for rendering.

    > (Of course,
    > I might not like its baseline, or size, or stroke-width, but I'm sure
    > I could
    > get over it.)
    >
    > Exactly what "wiggle-room" exists, in the current state of play?

    We can all figure out what things ought to be like, but this
    is a very murky area for implementations -- behavior of
    combining enclosing marks, which have never been very well
    defined themselves, in combination with orthogonal
    format control characters whose implementation is itself
    complex.

    Rather than engage in thought experiments about who we could blame
    for being in non-compliance if some weird sequence doesn't display
    as we expect, in this case it is much more straightforward to
    just encode the symbol in question and be done with it.

    That was essentially the argument that carried the day for
    other complex symbols such as U+FDFD ARABIC LIGATURE BISMILLAH
    AR-RAHMAN AR-RAHEEM.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Sep 22 2004 - 19:03:09 CDT