Re: Too narrowly defined: DIVISION SIGN & COLON from Julian Bradfield on 2012-07-12 (Unicode Mail List Archive)

From: Julian Bradfield <jcb+unicode_at_inf.ed.ac.uk>
Date: Thu, 12 Jul 2012 22:20:58 +0100

Hans wrote:
>On 12 Jul 2012, at 15:54, Julian Bradfield wrote:
..
>> Not to mention the symbols I've used from time to time, because
>
>You tell me, because I posted a request for missing characters in different forums. Perhaps you invented it after the standardization was made?

Why on earth would I care about whether my pet symbol (a mu-nu
ligature, which I started using to stand for "mu or nu as appropriate"
when I ran out of other plausible letters for it) is in Unicode? It
would be crazy to put it there, and of precious little benefit to me,
since I don't wish to write web pages about this stuff.

>>> them. In math, you can always invent your own characters and styles,
>> people do.
>You and others knowing about those characters must make proposals if you want to see them as a part of Unicode.

But wanting to do so would be crazy. My mu-nu ligature is, as far as I
know, used only by me (and co-authors who let me do the typesetting),
and so if Unicode has any sanity left, it would not encode it. My
colleagues in the Edinburgh PEPA group did try to get their pet symbol
encoded (a bowtie where the two triangles overlap somewhat rather than
just touching), but were refused; although that symbol now appears in
hundreds of papers by dozens of authors from all over the world. (I
think they wanted it so they could put it on web pages, which they
have lots of.)

Putting a symbol into Unicode imposes a huge burden on thousands of
people. Everybody who thinks it important to be able to display all
Unicode characters (or even all non-Han characters) has to make sure
that their font has it, or that the distribution they package has it,
or that all the software in the world knows how to find a font that
has it. Such effort is entirely inappropriate for symbols used ad hoc
by a small community, who are communicating in any case via either
fully typeset documents or by TeX pseudocode - or, on occasion, with
real TeX and a suitable font definition.

>> You mean "private use". Crazy thing to do, because then you have to
>> worry about whether your PUA code point clashes with some other
>> author's PUA code point.
>
>There is some system for avoiding that. Perhaps someone else here can inform.

There are many such systems - I don't need help or advice on this
matter. But none of them is appropriate for a symbol that perhaps you
want only for a few papers.

>>> UTF-8 only is simplest for the programmer that has to implement it.
>> Some of us are more concerned with users than programmers.
>Well, if the programmers don't implement, you are left out in the cold.

I'm not - if I care enough, I'll do it myself. Although most of my
work has actually been implementing utf-8 - as I said, the legacy
encodings are usually already done.

>> Neither working mathematicians nor publishers nor
>> typesetters like dealing with constantly changing extensions and
>> variations on TeX - one of the biggest selling points of TeX is
>> stability. (Defeated somewhat by the instability of LaTeX and its
>> thousands of packages, but that's another story.)
>> If I need to write complex - or even bidi - scripts routinely, I'd
>> probably be forced into one of them; but the typical mathematician
>> doesn't.
>
>I do not see your point here.

The point is that you don't use unstable rapidly changing systems for
anything that has an expected life of more than a year or two; and if
you're planning for somebody else to use it, you try to give them
something that runs on systems at least ten years older than yours.

>No. TeX cannot handle UTF-8, and I recall LaTeX's capability to emulate that was limited.

Somewhat limited, but good enough for every purpose I've so far needed
(maths, phonetics; and European, Indic, Chinese, Hebrew languages in
small snippets rather than entire documents). The main annoyance is
that combining character support is clunky, and that TeX really
doesn't support bidi properly - as I said - though it's remarkable
what hacking can be done.

>>>> you need to encode also letters that are semantically distinctively
>>>> roman upright.
>>>
>>> It has already been encoded as mathematical style, see the "Mathematical Alphanumeric Symbols" here:
>>> http://www.unicode.org/charts/
>>
>> *You* look. The plain upright style is unified with the BMP characters.
>
>Yes, that is why the Unicode paradigm departs from the TeX one.

This is as bad as Naena Guru... Unicode characters are
fontless. They are plain text. The Unicode standard even has a
nice little picture (Figure 2-2) showing how roman A, squashed A, bold
italic A, script A, fancy A, sans-serif A, brush-stroke A, fancy
script A, and versal capital A are all just LATIN LETTER A.

Now, in response to the desire of some mathematicians (maybe) to
write webpages without having to use clunky HTML markup (which is even
worse to use than TeX's), Unicode saw fit to encode characters such as
MATHEMATICAL BOLD ITALIC CAPITAL A.
This is not a logical problem: that character is distinguished from
LATIN LETTER A by the fact that its acceptable glyph variants cover a
much narrower range than those of A.

However, if you now say that MATHEMATICAL ROMAN CAPITAL A, which by
definition must be a seriffed upright non-bold roman letter, is the
same character as LATIN LETTER A, you must vanish in a puff of logic,
for the same character cannot both be a fontless A and also an A that
must be displayed in a very restricted range of glyphs.
Unless, that is, you have higher level markup that tells you when A
means A, and when it means \mathrm{A}. But if you have such higher
level markup, you don't need all the other variants anyway.
TeX provides such markup, by means of math mode. So TeX users can
choose to treat A as \mathrm{A} without inconsistency. However, they
can also choose to intepret the higher-level markup as saying "treat A
as itself", in which case TeX can do what it likes (in particular,
set in italic), also without inconsistency.
Thus there is no incompability between Unicode and TeX.

Similarly in MathML.

However, in plain text, you are screwed. There is no way to
distinguish between the generic A, and the A that must be roman,
except by human intelligence.

>You have yourself noted that the BMP characters must be used for upright for consistent Unicode use, incompatible with TeX which sets them as italic.

Which shows that Unicode is inconsistent, not that TeX is flawed.

>It is because there are currently no convenient input methods, also mentioned before in this thread.

There will never be a convenient input methods for thousands of
symbols. (I've spent some time designing convenient input methods for
the range of characters I use frequently, and I still can't always
remember them.)

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Received on Thu Jul 12 2012 - 16:25:03 CDT

This archive was generated by hypermail 2.2.0 : Thu Jul 12 2012 - 16:25:05 CDT