From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu Nov 25 2004 - 09:38:54 CST
I want to correct some misperceptions about CGJ; it should not be used for
ligatures.
From http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf#G12985, down on
page 392 (sorry for the boxes, that's Acrobat).
U+034F is used to indicate that adjacent
characters are to be
treated as a unit for the purposes of language-sensitive collation and
searching. In language-
sensitive collation and searching, the combining grapheme joiner should be
ignored
unless it specifically occurs within a tailored collation element mapping.
Thus it is given a
completely ignorable collation element in the default collation table, like
(see Unicode
Technical Standard #10, “Unicode Collation Algorithm,” and also ISO/IEC
14651).
However, it can be entered into the tailoring rules for any given language,
using the tailoring
capabilities of the collation standards.
For rendering, the combining grapheme joiner is invisible. However, some
older implementations
may treat a sequence of grapheme clusters linked by combining grapheme
joiners
as a single unit for the application of enclosing combining marks. For more
information
on grapheme clusters, see Unicode Technical Report #29, “Text Boundaries.”
For more
information on enclosing combining marks, see Section 3.11, Canonical
Ordering Behavior.
The combining grapheme joiner must not be confused with the zero width
joiner or the
word joiner, which have very different functions. In particular, inserting a
combining
grapheme joiner between two characters should have no effect on their
ligation or cursive
joining behavior. Where the prevention of line breaking is the desired
effect, the word
joiner should be used. For more information on the behavior of these
characters in line
breaking, see Unicode Standard Annex #14, “Line Breaking Properties.”
Mark
----- Original Message -----
From: "Doug Ewell" <dewell@adelphia.net>
To: "Unicode Mailing List" <unicode@unicode.org>
Cc: <pmr@informatik.uni-frankfurt.de>
Sent: Wednesday, November 24, 2004 22:09
Subject: Re: CGJ , RLM
> "kefas" <pmr at informatik dot uni dash frankfurt dot de> wrote:
>
> > 1. U+034F CGJ, Combining Grapheme Joiner, is
> > displayed as a tall rectangle in MSKLCexe-test and as
> > a capital square in OutlookExpress A͏E a͏e͏a͏e. But
> > CGJ "has no visible glyph"! Thus CGJ is not
> > implemented correctly in Arial Unicode MS. Or are the
> > editors not implemented correctly?
>
> U+034F was added to Unicode 3.2 in March 2002. Your copy of Arial
> Unicode MS may have been released before that date. Or it may be that
> Microsoft has chosen not to implement U+034F in this particular font,
> which is not the same as implementing it incorrectly.
>
> > Should A+CGJ+E
> > yield the Danish double letter a+(e-attached) ? Or
> > do I hope in vain.
>
> Someone, some day may choose to render A + CGJ + E as Æ. Don't be
> misled into thinking they are equivalent, however.
>
> > Is there a general rule how graphically to join 2
> > arbitrary characters? Normal tf looks already joined
> > to me, and causes me problems of recognizing t and f
> > as distinct letters. (I have astigmatism: cyl -3.0,
> > which is not that rare) m and rn look the same from
> > normal reading distance!. Some editors / some fonts
> > display an m with uneven spacing of legs, which looks
> > to me as if r+n is written. Any help in planning (you
> > font-designers)?
>
> There probably could not be a general rule about this, because it is too
> dependent on individual typeface designs. Sans-serif fonts like Arial
> will likely have many more "joined" combinations than serif fonts like
> Times, because the serifs interrupt the joining behavior. Whether the
> horizontal strokes on a "t" and an "f" line up with each other is also
> highly font-dependent. In many cases they do not.
>
> I think I have your astigmatism beat, at least in one eye.
>
> > 2. RLM, the Right to Left marker, seems to have no
> > effect yet. Hebrew bet+RLM+SPace should leave the
> > Cursor at Left and not 'jump' to the right of bet as
> > it does for good or worse for bet+SP. If this is a
> > correct expectation, then how can I tell (e.g. via
> > MSKLC.exe) to insert RLM+SPace on CAPS+SPace ?
>
> This may have more to do with the rendering engine than with the font.
>
> -Doug Ewell
> Fullerton, California
> http://users.adelphia.net/~dewell/
>
>
>
>
This archive was generated by hypermail 2.1.5 : Thu Nov 25 2004 - 09:43:04 CST