L2/02-237 From: "Nick Nicholas" To: ; Sent: Thursday, June 06, 2002 08:59 Subject: Greek: Haralambous proposals Unicoders, some you may already be familiar with Yannis Haralambous' paper http://omega.enstb.org/yannis/pdf/amendments2.pdf, given at Dublin. The issues of precomposition have been hashed over again and again, so I won't add to them; but I'd like to register some concern (and some agreement) with other bits of it. In the following, of course, I speak for myself, dude with goatee in Australia --- not for any of employers past or present, or for the Consortium. The first two 'rules' Haralambous are peculiar, because they are already taken care of by normalisation. Furthermore, while users can and should be steered away from certain diacritic combinations, Unicode can't be the body saying what characters you can combine with what --- since you might not want to write coherent Greek at all (a point Rick has made to inquiries from the Greek Unicode list, and Ken (I think) to me on a previous occasion.) So it needs to be made clear that these are rules for users using the characters to produce cogent Greek --- rules which Unicode, as I understand, doesn't want to police at a design level. To reiterate stuff that's been said before: 1.1. The acute and the tonos are the same thing; this is known. They weren't the same thing necessarily from 1982-1986, whence the confusion at ELOT. But normalisation takes care of this well. (The snarking at 2.1.4 is unnecessary: people with lots of knowledge of Greek were using a tonos distinct from the acute in 1982 --- including the notorious Prof. Kriaras --- and the confusion is likelier to be heritage from ELOT than Unicode's fault. Admittedly, the vertical dash that used to feature on the charts is not familiar to me as having been used in 1982...) 1.2. An uppercase letter can carry accents without breathings in older typographical traditions of Greek; but of course, those are instances in the middle of a word, because all-caps words used to be accented --- and their accents were above the letter, not to the left. The accents to the left occur only in the initials of title case words, and of course always require a breathing in polytonic. What on earth the function of U+1FBA is (A with initial grave) is a mystery for the ages; I assume ELOT just got confused, giving pseudo-polytonic equivalents of capitals plus tonos. But of course, those characters will now never go away. Hopefully anyone designing a font to deal with 17th century Greek will realise that those characters shouldn't be used for all-caps accents --- and the current glyphs are certainly going to discourage anyone from making that mistake. 1.6. The shame with 1.6 (avoid spacing diacritics to emulate capital letters with left diacritics) is that this is how every 8-bit Greek font on earth has done those capital diacritics, so people just blindly convert them across to Unicode like that. Rule 6 should be shouted from the rooftops; and anyone working with Unicode Greek should realistically expect that they will get texts with this misuse of spacing diacritics (which includes just about every polytonic Greek text online --- TLG texts excepted :-) .) 1.7. Using pre-combined characters rather than combining diacritics is something I've been guilty of myself in designing a website using Unicode Greek; but pre-combined chars are deprecated for good reason, and this suggestion should be downgraded even further: it is emphatically only an interim solution, until smart fonts (like Minion Pro) are in widespread use, and should be avoided in any text to be further processed electronically. Believe me, you don't want to write a search engine to deal with pre-combined characters and still allowing diacritic-insensitive searches... 1.8. Unfortunately yes, some people do confuse psili and apostrophe; I've had to deal with this in legacy text myself. 1.9. Guillemets are standard typographical practice for quotations in Greece --- but not at all for Ancient Greek, the quotations for which tend to follow that of the publishing country. Though there is a special place in Hell for people using single quotes in Greek (as they are readily confused with psili and daseia), mandating guillemets for an audience including Western classicists is unwarranted. Section 2 contains polemic against monotonic. Haralambous is entitled to his opinion; and you're entitled to mine, which is (a) good riddance (and the arguments made for the polytonic are anything but compelling); and (b) there is no way conceivable that polytonic Greek will make a comeback, when noone under 30 has learned it outside of Ancient Greek classes (and I say this as a 30 year old.) Polytonic may be holding on in *some* book publishing, but the majority of computer users will neither use it nor want to use it. To contend that monotonic is a local phenomenon, and that the needs of 10,000 classical scholars in the West outweigh those of 10,000,000 Greek and Cypriot nationals, is Canute-like. Unicode should indeed recognise continuing contemporary use of polytonic; but that monotonic is official and prevalent, and the priority for any implementers, is beyond dispute. 2.2. On the mute iota: a reminder that the subscript/mute iota is indeed mute now, but was not mute in Classical Greek (it started dropping out in the 3rd century BC.) And in the inscriptions of the time, of course, it wasn't subscript at all: scribes only started indicating its muteness by subscripting with the invention of lower case, which all subsequent Greek typography has followed. Mute iota is fine as a Modern Greek name (though so is ypogegrameni!) --- but I'm not convinced classicists will like it. In any case, Unicode now conflates the subscript and the adscript in normalisation, so here too no dire results can come about. Furthermore, Haralambous is making the time-old glyph/character confusion: Unicode has an adscript glyph in its code chart for capital subscripts, but noone is forcing Haralambous to use that glyph if his typographical tradition wants capital subscripts instead. So to say that AiDHS (small cap subscript) and A|dhs (subscript instead of adscript) are "not conformant to Unicode v3.2" is misleading. 2.3. It is true that the Greek circumflex looks like the combining inverted breve rather than the combining (Roman) circumflex; but surely the sensible solution, as already occurs with the treatment of precombined characters, is to treat the tilde and the inverted macron as glyph variants of the perispomeni, and to deprecate the characters for tildes, inverted macrons, and Roman circumflexes as realisations of the perispomeni. (I've yet to see a dialectologist employ a Roman circumflex on a Greek vowel, but it's not beyond the bounds of reason. Plenty of use of inverted macron-perispomenis on consonants in dialectology, though, to indicate palatalisation.) 2.5. Haralambous is correct on the confusion: stigma is numeric, digamma alphabetic. (The stigma is originally the uncial version of the digamma --- but by the time people were using uncials, the digamma was only used as a number. So the bifurcation between stigma and digamma is exactly parallel to that between the Q-koppa and the S-koppa, the latter form also being mediaeval.) 3.1. I would want evidence of use of the uppercase Kai symbol. I've only ever seen it lowercase --- though I admit I've only ever seen it in old-style shopfronts. But I have only ever seen it lowercase even in all caps contexts. If the capital kai symbol represents typographical practice of long ago, I'm inclined to think this is better handled as a straight ligature for cased "Kai": old Greek typography didn't exactly skimp on ligatures, and Unicode doesn't need to know about them. Unless I see compelling evidence to the contrary, I don't think a capital Kai warrants inclusion in Unicode any more than the "esti" ligature. Symmetry with the lowercase kai ligature is not enough of a rationale for its inclusion. 3.1. On the other hand, capital Lunate sigma is long overdue, and I've never understood why it was omitted: the papyrologists that use it use case as much as anyone. 3.2. The reversed iota and upsilon are also still in use in Greek dialectology. They are of course the homebrew version of the Jod, though maintaining the distinctions between their etymological origins (upsilon, iota.) It's an admirable quirk of Greek typography that people flipped iota circumflex and upsilon circumflex to get these jods; but I don't think Unicode should go down this path with discrete characters. You'll notice Haralambous' capital versions don't have tildes underneath, but breves. In fact, dialectologists in particular indicate jods for any combinations of characters pronounced as /i/ in the modern language; so you will see often enough eta with a combining breve underneath, or epsilon and iota with a combining tie (e.g. skol(ei)o = skoljo). Obviously eta breve should decompose the same way as iota breve, or the text becomes intractable; so I believe the correct solution for these is to encode these ersatz jods at the character level as letter + combining breve underneath (or tie), with the upside down tildes treated as glyph-variant ligatures (iota + breve underneath rendered through ligature as upside down iota-circumflex). 3.3, 3.4. This proposal has been rejected too often for me to repeat why. :-) What software and OS designers choose to implement or not as available combinations need not be any concern of Unicode's. If you need circumflexes on epsilons, or smooth breathings on cap upsilons, talk to Adobe, not Unicode. The rest of the world does not want yet more precomposed forms to normalise; and I'm surprised at Haralambous' insistence on this old ground. -- [][][][] [][][][][][][][][][] [][][][] Dr Nick Nicholas. opoudjis@optushome.com.au http://www.opoudjis.net University of Melbourne: nickn@unimelb.edu.au Chiastaxo dhe to giegnissa, i dhedhato potemu, ma ena chieri aftumeno ecratu, chisvissemu. (I Thisia tu Avraam)