Polytonic Greek in Unicode (particularly in HTML)

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Mon Jan 19 1998 - 11:13:43 EST


Dear colleague,

a member of our university is planning to offer WWW pages
with mixed-language text in ancient Greek, German, English,
and perhaps more languages. What shall I recommend to him?

Of course, my first reaction was "Unicode". This raised
some questions, as to what particular solutions I should recommend
so that the texts can be both easily typed and widely read:

- Which transfer-encoding is most widely understood?
  - UTF-8?
  - Latin-1 with numeric character codes? And if so,
    the decimal, or the new (HTML 4) hexadecadic, codes?
  - or what else?

- How to code diacritical marks, such as accents, breathing-marks,
  and iota subscript?
  - combining (U+0374--U+03C9 + U+0300--U+0345)?
  - pre-composed (U+0374--U+03C9, and U+1FF0--U+1FFC)?
  - partially composed (using U+03AA, U+03AB, U+03CA, or U+03CB,
    with combining accents, breathing marks, and iota),
    thus avoiding the Greek Extended range?
  According to chapter 2.2 of the Unicode 2.0 standard, these are
  equivalent; but do the browsers really render all of these
  codings? In particular, will they be aware of the "overriding
  behaviour" (cf. chapter 2.5 and figure 2-9 of the Unicode 2.0
  standard), so that accents and breathing marks are rendered side
  by side (rather than stacked above each other) when applied to
  Greek characters?

- How to code breathing marks and, optionally, accents with
  **upper-case** vowels? These are written left of their base
  characters, so there are various possibilities:
  - Fully pre-composed, e.g. U+1F0D (for "Ha!")
  - combining, e.g. U+0391 + U0314 + U+0301
  - detached accents, e.g. U+1FDE + U0391
  - ditto, combining, e.g. U+0020 + U0314 + U+0301 + U+0391
  Note that the detached accents must be coded prior to their
  respective base characters, whilst combining accents must be
  coded after their respective base characters.
  Will browsers render the accents properly on the left of (hence
  apparently before the) base character, in the pre-composed and
  combining variants? Will detached accents have any derogative
  effects (e. g. unappropriate line-breaks)?

- How to code iota subscript with **upper-case** vowels
  (the iota should be written at the lower right of the base
  character, as iota adscript)?
  - fully precomposed (U+1F88--U+1FFC)
  - combining (U+0345)
  - separate (U+037A)
  - separate, ordinary iota (U+03B9)

- Which fonts should be recommended to the end-users?
  Bitstream Cyberbit Version 1.1 does not cover the Greek Extended
  range (only Greek), according to its documentation.

- Which browsers should be recommended to the end-users?
  Do these browsers depend on particular operating system versions
  (e. g. for correct display, or printing)?

- Which editors could be recommended to the author of the WWW pages?
  Though UniEdit 1.2 (Duke University) does, by and large, cover the
  Greek Extended range, its fonts do not contain most upper-case vowels
  with diacritics.

Thank you, in advance, for any hints or insights you can give.
Best wishes,
   Otto Stolz



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT