Re: missing glyph `dotlessj'

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Sep 16 1997 - 13:53:43 EDT


I must emphatically second Mark Leisher's comment on this
exchange.

   *The combining accents are not a mistake.*

They are in Unicode by design, are implemented now in many
vendors' software, and are not going away.

It is true that there are a number of individuals in some
European standards bodies who do not like the combining marks
and who feel they were a mistake. Their opinions, filtered
through the rumor mills, appear to be turning into claims
that "The UNICODE Consortium has decided..." No such decision
has been made, and the Unicode Technical Committee stands
firmly committed to combining marks.

>
> Berthold K.P. Horn wrote:
> >
> > rf@cl.cam.ac.uk (Robin Fairbairns) writes:
> >
> > > Berthold K.P. Horn <bkph@ai.mit.edu> wrote:
> > > >(2) The UNICODE Consortium has decided that one should not construct
> > > >accented/composites characters by over-printing. Such characters
> > > >must each have their own code.
> >
> > > This isn't _strictly_ true: recall that there's at least one page of
> > > combining accents (I can't remember which page it is, and my copy of
> > > ISO 10646-1 has sprouted some *very* sturdy little legs and walked away).
> >
> > Well, the introductory verbage says that they should not be used.

Not true, in either The Unicode Standard, Version 2.0, nor in
ISO/IEC 10646-1:1993. The restriction that 10646 places on combining
marks is in Clause 15.1, where combining characters are not to
be used with *implementation level 1*. Unicode is a level 3
implementation of 10646, and combining characters are unrestricted
with level 3. Please check your facts.

> > I believe they now consider the `combining accents' a mistake.

Not true.

> > But since nothing can ever be removed from UNICODE once it is in
> > there, the `non spacing' diacritics will stay. Pretty much the
> > same reason there are still math symbols in there, even though
> > current UNICODE thinking appears to be that this was a mistake also

This is another piece of misinformation that verges on disinformation.

It has *never*, to my knowledge, been stated by anyone at a Unicode
Technical Committee meeting (and I have been to all but two of them)
that encoding the math symbols was a mistake. Not even stated as an
individual's opinion, much less as a consensus.
 
> > (and indicated by their never passing any proposal to make the math
> > character representation more complete and hence actually useful).

The Unicode Consortium would be delighted to have a more complete
encoding of symbols used in math. The fact is, no one from the
mathematical community has come forward with a detailed proposal
specifying what is missing and required for encoding. To my knowledge,
no such proposal has been presented either to the Unicode Technical
Committee or to a national standards body feeding into ISO/IEC JTC1/SC2/WG2,
where it could be acted upon.

By the way, it is important to keep in mind the distinction between
a character encoding (as Unicode) and a glyph encoding (as for
particular mathematical fonts). The character encoding must be
interpreted through a rendering engine to produce final results.
Mathematical formulae rendering involves numerous rules regarding
the modification and placement of glyphs, and those rules are not
recapitulated in the character encoding itself. Furthermore, the
availability of combining marks is assumed, so that vector notation
applied to Greek letters, for example, is handled by generative rule
involving combining marks, *not* by encoding each individual combination
as a character. Mathematicians, of all people, should understand
such concepts.

In the meantime, the Unicode Standard has the most complete set of
math symbols encoded in any existing *character* encoding standard.

Ken Whistler, Technical Director, Unicode, Inc.

P.S., in reference to the title of this thread, "missing glyph 'dotlessj'",
intelligent text rendering removes the dot from a j when applying
combining marks on top of it. 'dotlessj' is an SGML glyph entity, not
a character. It was considered and rejected for encoding as a character
years ago, precisely because it is a glyph and not a character.

> >
> > Maybe someone who actually knows something about this (like bnb :-)
> > could comment on this...
>
> >From someone who actually knows something about this, quite the
> contrary.
> The combining accents are not a mistake. The pre-composed characters
> exist for historical and political reasons. Without pre-composed
> characters, Unicode would be a much smaller character set, and just as
> complete.
> ------------------------------------------------------------------------
> mleisher@crl.nmsu.edu
> Mark Leisher "A designer knows he has achieved perfection
> Computing Research Lab not when there is nothing left to add, but
> New Mexico State University when there is nothing left to take away."
> Box 30001, Dept. 3CRL -- Antoine de Saint-Exupéry
> Las Cruces, NM 88003



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT