a member of our university is planning to offer WWW pages
with mixed-language text in ancient Greek, German, English,
and perhaps more languages. What shall I recommend to him?
Of course, my first reaction was "Unicode". This raised
some questions, as to what particular solutions I should recommend
so that the texts can be both easily typed and widely read:
- Which transfer-encoding is most widely understood?
- Latin-1 with numeric character codes? And if so,
the decimal, or the new (HTML 4) hexadecadic, codes?
- or what else?
- How to code diacritical marks, such as accents, breathing-marks,
and iota subscript?
- combining (U+0374--U+03C9 + U+0300--U+0345)?
- pre-composed (U+0374--U+03C9, and U+1FF0--U+1FFC)?
- partially composed (using U+03AA, U+03AB, U+03CA, or U+03CB,
with combining accents, breathing marks, and iota),
thus avoiding the Greek Extended range?
According to chapter 2.2 of the Unicode 2.0 standard, these are
equivalent; but do the browsers really render all of these
codings? In particular, will they be aware of the "overriding
behaviour" (cf. chapter 2.5 and figure 2-9 of the Unicode 2.0
standard), so that accents and breathing marks are rendered side
by side (rather than stacked above each other) when applied to
- How to code breathing marks and, optionally, accents with
**upper-case** vowels? These are written left of their base
characters, so there are various possibilities:
- Fully pre-composed, e.g. U+1F0D (for "Ha!")
- combining, e.g. U+0391 + U0314 + U+0301
- detached accents, e.g. U+1FDE + U0391
- ditto, combining, e.g. U+0020 + U0314 + U+0301 + U+0391
Note that the detached accents must be coded prior to their
respective base characters, whilst combining accents must be
coded after their respective base characters.
Will browsers render the accents properly on the left of (hence
apparently before the) base character, in the pre-composed and
combining variants? Will detached accents have any derogative
effects (e. g. unappropriate line-breaks)?
- How to code iota subscript with **upper-case** vowels
(the iota should be written at the lower right of the base
character, as iota adscript)?
- fully precomposed (U+1F88--U+1FFC)
- combining (U+0345)
- separate (U+037A)
- separate, ordinary iota (U+03B9)
- Which fonts should be recommended to the end-users?
Bitstream Cyberbit Version 1.1 does not cover the Greek Extended
range (only Greek), according to its documentation.
- Which browsers should be recommended to the end-users?
Do these browsers depend on particular operating system versions
(e. g. for correct display, or printing)?
- Which editors could be recommended to the author of the WWW pages?
Though UniEdit 1.2 (Duke University) does, by and large, cover the
Greek Extended range, its fonts do not contain most upper-case vowels
Thank you, in advance, for any hints or insights you can give.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:39 EDT