From: Peter Kirk (firstname.lastname@example.org)
Date: Wed Jul 23 2003 - 08:13:54 EDT
On 23/07/2003 03:20, Paul Nelson (TYPOGRAPHY) wrote:
>Please look at the definition of GCJ and other such characters.
>Understand the differences between CGJ and ZWJ/ZWNJ.
>This discussion is very disturbing to me because after reading through
>the L2 document register it is unclear what is the difference between
>GCJ and ZWJ use.
>The fact that you desire a control character to not be treated as such
>greatly concerns me. This really feels like people are trying to figure
>out any way to twist existing constructs to avoid fixing the
>normalization weights. I am alarmed from the implications of putting
>control characters in place to somehow subvert the normalization.
>In an ideal world we would simply correct these values. However, it has
>been strongly communicated by the UTC that this cannot be done without
>jeoparizing stability agreements with IETF. Peter Constable has posted a
>document in the register on this topic that suggests a duplication of
>characters as a solution.
>Can we please have this topic put on the agenda for the next meeting of
I have been doing a little research into the defined properties of CGJ.
I note also that according to
http://www.unicode.org/book/preview/ch03.pdf it is defined in Unicode
4.0 as a "Default Ignorable". Well, I am not surprised that some people
are confused because
tells me "For more information, see UAX #29: Text Boundaries
<http://www.unicode.org/reports/tr29/>.", but the string "ignorable" is
not found in UAX #29. But from a Google search I found
http://www.unicode.org/review/pr-5.html, desribed as "/text excerpted
from the Unicode Standard/", section number 5.22 given so I suppose this
is from the unpublished chapter 5 of Unicode 4.0. According to this,
"Default ignorable code points are those that should be ignored by
default in rendering (unless explicitly supported)... An implementation
should ignore default ignorable characters in rendering whenever it does
/not/ support the characters." So my suggestion that a renderer should
simply ignore CGJ is far from twisting the requirements of Unicode, it
is in fact a requirement of Unicode 4.0 though one that I am hardly
surprised that some people have missed.
The internal process by which a particular renderer implements ignoring
a glyph is a matter for a particular implementation. John Hudson and I
have suggested a mechanism for doing this with Uniscribe by treating the
character internally as a normal character with a blank glyph and always
ligating it with the preceding character. There may be other mechanisms
which are cleaner. But in any case it seems to be a requirement not just
for fixing this Hebrew problem but for conformance with Unicode as a
whole that some such mechanism is implemented, so that CGJ is ignored by
the renderer unless some specific behaviour is defined. In the case of
rendering Hebrew, there seems to be no pressing need to define specific
behaviour as the default is at least close to what is required.
-- Peter Kirk email@example.com http://web.onetel.net.uk/~peterkirk/
This archive was generated by hypermail 2.1.5 : Wed Jul 23 2003 - 09:04:53 EDT