Re: Lacking large curley brace building characters

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 02 2000 - 21:21:38 EDT

Next message: Sina Ahmadian: "Farsi..."
Previous message: Markus Scherer: "Re: word processors on Win32 and Mac which support UCS-2..."
Maybe in reply to: Bernd Warken: "Lacking large curley brace building characters"
Next in thread: Michael Everson: "Re: Lacking large curley brace building characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Bernd Warken commented:

> As far as I see, there are no Unicode characters for building large
> curley braces. Such braces are used in mathematics (for the definition
> of variate functions and in combinatorics) and in music notation
> (grouping of staffs).

As Murray Sargent noted, the standard set of curly brace parts
(as well as bracket, parenthesis, integral, and arrow extender parts)
seen in PostScript, groff, TeX, have already been considered and
accepted, and will be part of a future version of the Unicode Standard.
At this point, it is, however, unlikely that they will be a part
of Unicode 3.1 (still in the planning works, but likely to be synchronized
with the pending publication of ISO/IEC 10646-2 early next year),
simply because the amendment that they will be a part of has yet
to begin its international ballotting, and is considerably behind
the repertoire additions that are nearly ready for 10646-2.

10646-2 (and Unicode 3.1) will also contain a completely separate
set of musical notation graphic primitives, intended for use by musical
layout rendering programs. That set contains various barlines, as
well as the brace and bracket used to tie staffs together. Musical
layout rendering engines are assumed to be "smart" enough to calculate
and draw the completed musical notation in complex ways -- all kinds
of musical elements have to be dynamically adjusted -- bars on notes,
positions of the noteheads, length of barlines, slurs and ties,
sforzandos, etc., etc., in addition to braces and brackets tying
staffs together. Whether the renderings depend on pieces in fonts,
or are themselves all dynamically constructed would depend on the
implementation. The *character* codes for these things are merely
intended as textual anchors for the layout engine, which will combine
them with its representation of the tonal and rhythmic structure in
working out complete musical display.

But in any case, all of the pieces for either approach to layout
of large braces and brackets will be present eventually in the
standard.

>
> It's amazing that the Unicode Consortium did not include these
> characters, for, they are already documented in the mother documentation
> for all type-setting systems
>
> J.F. Ossanna, Brian W. Kernighan - Troff User's Manual,
> Bell Labs CST Report No. 54
>
> The history of this classical paper dates back to the 1970s and was
> maintained by Kernighan up to the 1990s. So Unicode should have known
> about that.

It is not so amazing at all. Of course we knew about them. Most of
them are also part of the standard PostScript symbol encoding, which
was also considered as input to Unicode 1.0 way back in 1989.

What you need to understand is that this was an intentional decision
made early on, when the focus of the Unicode Standard was somewhat
different than that which it has evolved to over the past decade,
during a time when there was less understanding of what all the
implementation implications of a universal character encoding would
eventually be, and when there were less people involved in demanding
their particular requirements be covered by the universal standard.

The early architects of the Unicode Standard had in mind a vision
of a "better" character encoding, that assumed a much more principled
distinction between underlying character and rendered display, and
that would not grandfather in all the character cell-oriented
character encoding hacks derived from the days of terminals and
character-oriented displays, nor the particular hacks regarding
glyph construction involved in systems such as troff or TeX.

Very quickly, however, it became evident to all that lots of compatibility
compromises would be necessary or Unicode would die aborning. Hence,
for example, all the precomposed Latin characters. For much of its
early life, the *main* function of Unicode was to serve as a universal
hub to deal with the cacophony of existing legacy character encodings
in interconnected contexts. It has taken longer for its other function
as the textual backbone of complex text rendering systems in
multilingual contexts to mature and take hold. And in the meantime,
since Unicode is the only universal character encoding game in town,
more and more strange encoding requirements ("strange" from the point
of view of the original architects) have been put forward, and many
of those with reasonable justifications for them have made or are
making their way into the standard.

The character encoding of all the brace and bracket parts has just
been one of these recent accretions. It has been made clear
that compatibility with, and interoperating with troff and other
more sophisticated math layout programs is just easier for people to
accomplish if all these legacy glyph pieces get their own character
codes. And so it was done.

By the way, it is instructive to watch the evolution of a key
paragraph from Unicode 1.0, page 3:

"The Unicode standard version 1.0 does not encode rare, obsolete,
idiosyncratic, personal, novel, rarely exchanged or private-use
characters, nor does it encode logos or graphics. ... Graphologies
unrelated to text, for example, musical and dance notations, are
outside the scope of the Unicode standard. Braille symbols were not
encoded, since Braille is an alternative way to present text (it can
be considered a font variant)."

Well, the statement about Braille turned out to be flat wrong. Braille
symbols have since been encoded. Not just one, but *two* distinct sets of
graphic primitives for musical notations are coming down the pike
for Unicode 3.1. There are many thousands of rare and obsolete
characters encoded now, most notably among the Han characters, but
scattered elsewhere as well. So the relevant paragraph in the
Unicode Standard, Version 3.0, page 2 has been toned down to:

"...the Unicode Standard does not encode idiosyncratic, personal,
novel, rarely exchanged, or private-use characters, nor does it
encode logos or graphics. Graphologies unrelated to text, such
as dance notations, are likewise outside the scope of the Unicode
Standard. Font variants are explicitly not encoded...."

The standard evolves.

>
> I will not take up the torture of a character submission or even discuss
> the subject.

Fortunately, others have already been tortured regarding this one --
and some are still screaming on the rack. So we won't have to
throw you in the dungeon to join them.

--Ken

> This is a friendly bug-report, not more. It's up to you
> to use the information.
>
> Bernd Warken <bwarken@mayn.de>
>
>

Next message: Sina Ahmadian: "Farsi..."
Previous message: Markus Scherer: "Re: word processors on Win32 and Mac which support UCS-2..."
Maybe in reply to: Bernd Warken: "Lacking large curley brace building characters"
Next in thread: Michael Everson: "Re: Lacking large curley brace building characters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT