The 'phi' and 'phi symbol' glyph mess

From: Robert Herzog (Robert.Herzog@cern.ch)
Date: Tue Aug 24 1999 - 03:50:51 EDT


Although I did not find any reference in the list archives,
sorry if this has already been discussed here.

-------------
Definitions for the purpose of this mail:

'open theta': Greek small letter theta which looks like a '9' with
   serifs. (I also found: 'curly theta', 'theta1', 'thetav', 'vtheta',
   'script theta' etc.)

'closed theta': Greek small letter theta which looks like a '0' with
   a horizontal bar. (also: 'straight theta')

'open phi': Greek small letter phi which looks a bit like a 'u' with
   the right stem turn inwards and down below the base line (you know
   what I mean...) (also: 'curly phi', 'low-topped phi', 'phi1', 'phiv'
   'script phi' etc.)

'closed phi': Greek small letter phi which looks like an 'o' with a long
   vertical bar in the middle (also: 'tall phi')
-------------

Office 2000 for Windows contains the two fonts 'Arial' and 'Arial Unicode MS',
both from Monotype. 'Arial' is a WGL4 font, but also contains Hebrew and
(http://www.microsoft.com/typography/OTSPEC/WGL4.htm) Arabic glyphs.
'Arial Unicode MS' contains glyphs for all Unicode 2 code points.

Now: In 'Arial' the 'open phi' is encoded as U+03C6 (Greek small letter phi),
whereas in 'Arial Unicode MS' it sits at U+03D5 (Greek phi symbol) with the
'closed phi' at U+03C6. In other words: If I enter U+03C6 I get the 'open phi'
when choosing the 'Arial' font and the 'closed phi' when using 'Arial Unicode MS'.

This does not seem to be the intention of Unicode, or am I wrong?

Some research on the web suggests that both fonts could be 'right'
in their way, but the result seems to be precisely the mess which
Unicode was meant to prevent.

Here is what I found:

1. It seems to be acknowledged by this list that there is a
   _semantic_ difference between 'open theta' and 'closed theta' etc.

> From: Murray Sargent <murrays@microsoft.com>
> To: "'Scott Horne'" <shorne@metaphasetech.com>
> Cc: Unicode List <unicode@unicode.org>
> Subject: RE: Superscript asterisk
> Date: Thu, 1 Jul 1999 18:14:49 -0700
> X-Mailer: Internet Mail Service (5.5.2524.0)
>
> Two Greek characters variants are fairly often used together in the same
> math doc. People just want more Greek characters than there are. E.g.,
> both epsilons (straight and script), both phis, both thetas can all appear
> in the same document. One thing you learn studying mathematical docs is
> that people do things routinely that you might expect to be nonexistant. I
> don't personally know of any mathematical expression that uses the final
> sigma, but I bet Barbara Beeton can find one. Furthermore, we'd need to
> reserve the location to keep case switching simple. The current layouts are
> tightly related to those in the BMP, so that the algorithms can be used with
> a minimum of changes.
>
> Murray

2. A number of things indicate that there is something 'in the bush':

  a. Why were the Unicode 1.0 names GREEK SMALL LETTER SCRIPT THETA
     and GREEK SMALL LETTER SCRIPT PHI fudged to GREEK THETA SYMBOL
     and GREEK PHI SYMBOL? (To me this only adds ambiguity.)

  b. The suggested HTML4 list of named entities from Nov. 96
     (http://lists.w3.org/Archives/Public/w3c-sgml-wg/msg02189.html)
     contains (among others) the following two entries:
     03D1 thetav GREEK THETA SYMBOL
     03D5 phiv GREEK PHI SYMBOL
     The latter one was removed from the final doc (why?):
     http://www.w3.org/TR/WD-html40-970708/sgml/entities.html
     It was intended to include all characters of the (cursed?) 'Symbol'
     font (apart from the fractional brackets) in this list. Now only
     the 'phisym' is missing, which mathematicians and scientist (like me)
     probably regret, unambiguity assumed.
    
  c. Does the WGL4 deliberately not contain any Greek variant letter forms?
  
3. There seem to be (at least) three groups using Greek letters:
   People writing modern Greek (the Greek), (Ancient Greek) classicists
   and mathematicians/scientist/engineers which use them as variable names etc.
   For the former two there might not (?) be a semantic difference between
   'open theta' and 'closed theta', but for the last there definitely is.
   
   Most of the 'Greek' fonts on the web can be put in one of the following
   categories:

   a. ISO-8859-7, WGL4 fonts
      In these theta is coded at E8 and phi at F6. They are intended
      for use with modern Greek. They _usually_ contain (only) a

        'closed theta' and an 'open phi'.

   b. SMK GreekKeys, WinGreek (SGreek), Ismini
      These are three common (and different) encodings used by
      classicists, who need many accented vowels. The standard
      Greek letters are at code points below 127 and the accented
      vowels above 128. As far as theta and phi goes, there are two types
      of fonts. 'Anglophone' classicists seem to regard

        'closed theta' and 'closed phi'

      as the standard glyphs and 'open theta' and 'open phi' as variants,
      whereas 'Continental' (i.e. European except the British)
      classicists regard
     
        'open theta' and 'open phi'

      as the standard glyphs and 'closed theta' and 'closed phi' as variants.
      (see greekof.txt in ftp://hopi.dtcc.edu/pub/berlin/fonts/groftt.zip;
       also http://www.dtcc.edu/~berlin/font/greek.htm)

   c. The famous 'Symbol' font
      Everyone else who wants a Greek letter uses the 'Symbol' font, although
      it's glyphs fit not very well to 'Times' (x-height 10 % larger) and neither
      (good) italics nor a sans serif font are commonly available (Who has not
      seen a 'Symbol' font Greek letter in English Arial/Helvetica text? - Ouch!).
      The designers of the 'Symbol' font seem to have followed the
      'Anglophone' classicists, because at the standard code points
      (i.e. paired with the capital theta and phi) one finds the

        'closed theta' and 'closed phi'

      whereas the 'open theta' and 'open phi' are encoded at 'J' and 'j'
      respectively (, but at least they are there).

   Unfortunately (IMO) Unicode also followed the 'Symbol' font and the
   'Anglophone' classicists with the glyphs depicted at
   http://charts.unicode.org/Unicode.charts/glyphless/U0370.html
   and stayed (deliberately?) vague in the character descriptions.

   Back to the third user group, the mathematicians etc. They don't care
   which is the standard form and which the variant, as long as the two
   are unambiguously distinguishable! If I send a Word document with a
   'closed phi' in 'Arial Unicode MS', I don't want that it becomes
   an 'open phi' at the recipient's end, only because he only has
   the WGL4 'Arial' font installed (which is very likely these days,
   because it comes with Internet Explorer, which in turn is installed
   on most new PCs).

So finally my suggestion (being rather ignorant of the pains of
standards processes): I think Microsoft was right to follow the modern
Greeks and adopt 'closed theta', 'open phi' as the standard theta and
phi forms in their WGL4 fonts, because there are 10 million Greeks and
much fewer classicists, particularly if one only counts the 'Anglophone'
ones. All the mathematicians, scientists and engineers (surely more
than 10 million) want unambiguity!!

So please swap the glyph images of U+03C6 and U+03D5 in the Unicode standard
(http://charts.unicode.org/Unicode.charts/glyphless/U0370.html)
and introduce _unambiguous_ character names for all Greek letter variants,
such that font designers know where they _should_ put their glyphs.

Since the whole thing is broken anyway ('Arial' and 'Arial Unicode MS' etc.),
not much old code will be broken. And the classicists still need to
have their 'Anglophone' and 'Continental' font variants anyway...

Thanks for reading, sorry if it bored you,

Robert Herzog

Note: To enter a Unicode character in Word97 one can procede as follows:
  Insert menu -> Field -> Categories: Equations and Formulas -> Field
  Names: Symbol -> Options... button at the bottom -> Switches: \u ->
  Add to Field button -> insert '0x03C6 ' before the \u -> ok -> ok.
  Now you should see a 'phi' if you have a Unicode font containing it.
  With the insertion point (cursor) just before the symbol (it appears
  with grey background), one can toggle the field and see the field code
  by pressing 'Shift+F9'. The field code should be
  {SYMBOL 0x03C6 \u \* MERGEFORMAT}.

PS: I got involved in 'theta' and 'phi' some years ago when I wanted
  to use the 'open' ones for polar coordinates (which is the convention
  in Germany/Austria - ah, the 'continental' classicists...) in my
  thesis and could not find nice italic glyphs fitting to 'Times' and
  'Arial'. Ever since I am looking full of hope at each new release of
  Windows and Office to maybe find a complete set of the Greek letters
  (including variants) in _all_ the standard fonts, but so far in vain.
  I even sent a few mails to Microsoft Typography. Only now I realise
  that things are not so easy.

[ PPS: I didn't use TeX, because I didn't want to be a masochist, but
  now I am not so sure if I have not become one anyway... ;^) ]

----------------------
Robert Herzog
LHC division, CERN (European Laboratory for Particle Physics)
1211 Geneva 23, Switzerland



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT