Re: Too narrowly defined: DIVISION SIGN & COLON from Jukka K. Korpela on 2012-07-10 (Unicode Mail List Archive)

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Tue, 10 Jul 2012 09:04:49 +0300

2012-07-10 5:32, Asmus Freytag wrote:

> There are many characters that are used in professional mathematical
> typesetting (division slash being one of them) that need to be narrowly
> distinguished from other, roughly similar characters.

Typographic differences can be made at glyph selection level, too, or
even in font design and choice of font. Typesetting systems like TeX and
derivatives have been very successful along such lines.

> Such narrowly defined characters are not aimed at the general user, and
> it's totally irrelevant whether or not such a character ever becomes
> "popular".

Popularity is relative to a population. When I wrote that “narrow
semantics does not make characters popular”, relating to the case of
DIVISION SLASH, I referred to popularity among people who could
conceivably have use for the characters. I don’t think there’s much
actual use of DIVISION SLASH in the wild. And this was about a case
where the distinction is not only semantic (actually the Unicode
standard does not describe the semantic side of the matter except
implicitly via things like Unicode name and General Category of the
character) but also has, or may have, direct impact on rendering.

> Very early in the design cycle for Unicode there
> was a request for encoding of a decimal period, in distinction to a full
> stop. The problem here is that there is no visual distinction

This is more or less a vicious circle, and the starting point isn’t even
true. In British usage, the decimal point is often a somewhat raised
dot, above the baseline. But even if we assume that no distinction *had
been made* before the decision, the decision itself implied that no
distinction *can be made* by choice of character.

If a different decision had been made, people could choose to use a
decimal point character, or they could keep using just the ambiguous
FULL STOP character. Font designers could make them identical, or they
could make them different. But most probably, most people would not even
be aware of the matter: they would keep pressing the keyboard key
labeled with “.” – that is, the decimal point character would not have
much popularity. In British typesetting, people would probably still use
whatever methods they now use to produce raised dots.

> Unicode has relatively consistently refused to duplicate encodings in
> such circumstances, because the point about Unicode is not that one
> should be able to encode information about the intent that goes beyond
> what can be made visible by rendering the text. Instead, the point about
> Unicode is to provide a way to unambiguously define enough of the text
> so that it becomes "legible". How legible text is then "understood" is
> another issue.

That’s a nice compact description of the principle, but perhaps the real
reasons also include the desire to avoid endless debates over
“semantics”. Some semantic differences, like the use of a character as a
punctuation symbol vs. as a mathematical symbol, are relatively clear.
Most semantics differences that can be made are not that clear at all.

> Because of that, there was never any discussion whether the ! would have
> to be re-encoded as "factorial". It was not.

This implies that if anyone thinks that the factorial symbol should look
different from a normal exclamation mark, to avoid ambiguity (as in the
sentence “The result is n!”), he cannot do that at the character level.

A large number of mathematical and other symbols have originated as
other characters used for special purposes, then styled to have
distinctive shapes, later identified as separate symbols. For example,
N-ARY SUMMATION ∑ is now mostly visually different from GREEK CAPITAL
LETTER SIGMA Σ, though it was originally just the Greek letter used in a
specific meaning and context.

A principle that refuses to “re-encode” characters for semantic
distinctions seems to put a stop on such development. But of course new
characters are still being developed from old characters for various
purposes and can be encoded. They just need to have some visual identity
different from the old characters from the very start, to have a chance
of getting encoded.

> The proper thing to do would be to add these usages to the list of
> examples of known contextually defined usages of punctuation characters,
> they are common enough that it's worth pointing them out in order to
> overcome a bit of the inherent bias from Anglo-Saxon usage.

So what would be needed for this? I previously suggested annotations like

: also used to denote division

and

÷ also used to denote subtraction

But perhaps the former should be a little longer:

: also used to denote division and ratio

(especially since the use for ratio is more official and probably more
common).

Yucca
Received on Tue Jul 10 2012 - 01:09:37 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 10 2012 - 01:09:38 CDT