William Overington <WOverington at ngo dot globalnet dot co dot uk>
> For example, a recent experiment, documented
> in the archives of this list as The Respectfully Experiment, shows
> that there is now new evidence about the facts regarding the encoding
> of code points for ligatures...
and <Peter_Constable at sil dot org> responded:
> Also, I don't recall posts from this list detailing the experiment
> referred to above, which admittedly leaves me at a disadvantage.
<sigh /> OK, here are the details. I'm reluctant to admit having been
part of this "experiment," since it is now being presented as evidence
to support the proliferation of private-use ligatures. But anyway:
In late May, William began suggesting that code points in the Private
Use Area be assigned to Latin ligatures, so that they could be
represented in plain text without the use of smart fonts, ZWJ sequences,
etc., which he claims are only available to users of "the very latest"
hardware and software. In particular, William proposed the PUA code
point U+E707 for the Latin "ct" ligature, and that particular ligature
was used as an example throughout the thread.
On 2002-05-31, I wrote a response which ended "Respectfully, Doug,"
except that I used William's code point U+E707 in place of the letters
"ct." My intent was that everyone on the Unicode list, including
William, would see "Respe<black box>fully," thus demonstrating the lack
of interoperability of this PUA solution. Only users of a font that
happened to contain William's PUA character would see the ct ligature,
and I didn't think any such font existed.
Much to my surprise, however, James Kass had modified his private
version of the Code2000 font to include William's ct ligature at U+E707,
and he was using it to read my message, so oiut of everyone on the list,
he alone did see the ligature.
William observed that I had sent, and James had received, a ct ligature
at U+E707, based not on any private arrangement but on our mutual (and
coincidental) use of William's code point. He latched onto this chain
of events as proof that end-user publication of PUA code points was a
success, and named it "The Respectfully Experiment," despite my protests
that the whole incident was a freak accident. (I think "ct" was the
only one of William's "golden" ligatures for which James had provided a
> ... because it has now been realized that
> such code points can now be used in conjunction with ZWJ type
> mechanisms of advanced font technology formats as an alternative
> method of coding to assist people with less than the latest equipment,
> such code points for ligatures working in conjunction with advanced
> font technology rather than as an alternative to it which is the way
> that such code points were regarded when a decision not to encode any
> more of them without a strong case was taken, though even for that
> decision there was provision for the general thrust of that decision
> to be overridden in the light of future evidence.
and Peter replied:
> ... my knowledge of the Unicode standard and of advanced font
> technologies leaves me rather puzzled about how "...code points [for
> ligatures] can now be used in conjunction with ZWJ type mechanisms
> of advanced font technology...": codepoints for ligatures are
> unnecessary because of advanced font technologies. ZWJ does not work
> in conjunction with encoded ligatures because encoded ligatures
> aren't needed; and if they existed, ZWJ would not particularly
> interact with them in any usage that has been described as part of
> the Unicode standard.
I don't think William really meant "in conjunction with" in the sense
that ZWJ would be applied directly to precomposed ligatures. He meant
in the same document, or maybe not even that, maybe just that both types
of ligation (precomposed and ZWJ) would be conformant to Unicode.
There is no new revelation here, though. Nothing "has now been
realized" that wasn't already known. Because of the Latin compatibility
ligatures already encoded from U+FB00 through U+FB06, it has always been
possible to encode (say) an "fi" ligature using either the ZWJ method or
the precomposed ligature at U+FB01, or both according to whimsy. (Or it
might just magically appear without any special encoding at all, as John
Jenkins could point out.) It makes no difference whether a PUA code
point or a standardized Unicode code point is used for the precomposed
Nothing about "The Respectfully Experiment" -- and no appeals that users
of 80386 PCs must be able to reproduce 18th-century ligatures in plain
text -- will serve as the "evidence" that William is waiting for to
overturn the decision not to encode any more precomposed ligatures.
> There is at present a barrier, which I feel might be called the markup
> barrier, which is acting as a barrier to progress.
This is not a barrier; it's just a distinction. People who are VERY
familiar with the issues have drawn a line between:
(a) plain text, which represents content, and
(b) markup, which represents formatting.
Reasonable people can differ as to exactly where the line should be
drawn, and there are indeed some examples where Unicode has tiptoed over
the line. But most of the examples are now either deprecated or
> I wonder whether
> the markup barrier is some absolute barrier or whether it is just a
> temporary thing which exists in people's minds
It's not the law of gravity. It's a decision that has been made by
humans who have studied the issues and drawn a line.
> Is the markup barrier absolute or is it just that the markup barrier
> is regarded as being an absolute barrier because of a fallacy of
> reasoning in that whereas
> 1. markup is useful in some circumstances,
> 2. markup provides the opportunity to encode a system without a need
> to have additional code points,
> 3. markup does not have a requirement to have meanings assigned to
> code points by a standards committee,
> that those reasons, set against a historical background, have led to a
> view that code points cannot be used for things for which markup is
> presently used, notwithstanding that there seems to be nothing in the
> definition of character that is being used that would seem to go
> against using code points directly for such meanings as 36 POINT and
You know what? Several of us in technology fields have this nasty habit
of trying to embrace ANY change that promises even an 0.01% improvement,
regardless of the costs involved in making such a change. I see it in
all the time software development, where perfectly good
structured-programming techniques are scorned if they are not
sufficiently object-oriented, and yesterday's One True Way of C++ is now
being cast aside in favor of C#.
I see the same effect here, where despite the completely adequate
mechanisms we have today for encoding rich text -- HTML, XML,
proprietary formats like Word, even old-fashioned RTF -- someone thinks
it could all be done better with character encoding. Well, there's a
pretty substantial installed base of the existing technologies, and I'm
sorry if this sounds like "stifling progress," but it's not enough for a
new solution to be simply "better"; it must be "better enough." The
benefit has to justify the cost. Even if there are advantages to
encoding greenness and 36-point-ness with standardized character codes
instead of with existing markup methods (which I'm still not convinced
is true), the advantages simply aren't great enough to displace the
existing methods. What we have right now is not broken.
By contrast, Unicode itself IS "better enough" than the previously
existing hodgepodge of character encodings to justify making the switch.
(Well, maybe not for everyone yet, as the parallel Shift-JIS thread
> It would be a very interesting exercise for people to discuss exactly,
> precisely what Unicode is not, giving detailed reasoning for each such
> claim rather than simply saying that that is how it is, because if
> suggested formal limitations of the scope of what could be encoded
> start to be suggested then it may be that with its usual vigour that
> this discussion forum would result in counter examples which would
> make many of the suggested limitations obsolete.
It would be more than just an exercise, it would be a great idea, an
excellent addition to the Unicode Web site, and one of the most on-topic
threads I've seen in the 6 years I've been on this list.
But please bear in mind, first and foremost, there ARE some things that
the Unicode Standard is "for" and some things it is not "for." Please
follow my advice and that of others on this list, more expert than I am,
and read the Standard and the Technical Reports that have been
mentioned. That will, hopefully, give you the background you need to
make intelligent contributions to the discussions.
This archive was generated by hypermail 2.1.2 : Fri Jun 28 2002 - 22:53:18 EDT