From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Sep 22 2004 - 18:50:46 CDT
Jonathan Coxhead asked:
> >>Then could/should we use the sequence <200C, 062D, 20DD, 200C>?
> >
> >
> > You *could* use that sequence, and if your rendering implementation
> > were sophisticated enough, it *might* render what you were
> > expecting.
>
> So here's my question ...
>
> If I did write the sequence <200C, 062D, 20DD, 200C>, would
> *should* I expect?
>
> It seems to me that---barring bugs---this ought to produce the symbol
> expected, in a completely standard-conforming way, and with no extra encoding
> needed.
Well, *assuming* that you are dealing with a Unicode 4.1 (or subsequent)
implementation of Arabic and bidi that has been updated to the
Unicode 4.1 data files (so that U+20DD is jt=Transparent), and
*assuming* you have access to a font that can actually represent
a circle around U+062D legibly, yes, you should expect to see a
circled HAH.
Note that <200C, 062D, 200C, 20DD> should *also* produce the same
visual rendering, but is not canonically equivalent to the first
sequence. So that is *another* fly in the ointment.
>
> If I write <200C, 062D, 20DD, 200C>, and I don't see this Saudi
> copyright
> sign, shouldn't I be able to complain to someone for non-compliance?
No, not if the renderer you are using doesn't claim to "interpret"
U+20DD for rendering.
> (Of course,
> I might not like its baseline, or size, or stroke-width, but I'm sure
> I could
> get over it.)
>
> Exactly what "wiggle-room" exists, in the current state of play?
We can all figure out what things ought to be like, but this
is a very murky area for implementations -- behavior of
combining enclosing marks, which have never been very well
defined themselves, in combination with orthogonal
format control characters whose implementation is itself
complex.
Rather than engage in thought experiments about who we could blame
for being in non-compliance if some weird sequence doesn't display
as we expect, in this case it is much more straightforward to
just encode the symbol in question and be done with it.
That was essentially the argument that carried the day for
other complex symbols such as U+FDFD ARABIC LIGATURE BISMILLAH
AR-RAHMAN AR-RAHEEM.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Sep 22 2004 - 19:03:09 CDT