Geejay and others

From: André Szabolcs Szelp (
Date: Sun Jan 06 2008 - 16:01:26 CST

  • Next message: Doug Ewell: "Re: old Latin chars (was RE: Acceptable alembic…)"


    I have noticed that my previous email reached the list "too late" in some aspects, I apologise, but I'm receiving the list in digest mode and I do not wish to change that for the sake of my mailbox. I'll also combine answers to several posts, all loosly belonging to the Geejay discussion.

    Jeroen Ruigrok van der Werven wrote:
    > > I was wondering, whether it's Unicode's task to encode every single
    > > character ever printed, even if it was created on the fly for a
    > > single project...
    > And what if you want to make available such historical documents using an
    > electronic medium? The only option you have would be a scanned image (with
    > all its pros and cons) or an incomplete text due to certain glyphs being
    > replaced with non-equivalent ones.

    I do believe that for such really one-off, nonsystematic* characters in historic texts when encoding them digitally PUA characters should be used with accompanying metadata in form of illustrations or descriptions. This is usually not a problem, as if the text is a citation in a larger work, the main text of the larger work presents a chance to communicate the meta-information to the reader, but also for larger bodies of letter-faithful reprints some accompanying metainfo is usually provided (e.g. foreword), or README.txt, etc.etc.

    The most obvious meta-information is to create a custom font with bespoke glyph at given the PUA positon, however in the case of accomanying text describing the glyph or illustration with images, the data can be interpreted in lack of the specialised font as well.

    Also note that linguistic works in the 20th century when resolving the abbreviations and marks not used in current Latin script would provide the metainformation, somthing like this (quotation made up, but you find it all the time reading such editions, either such or illustrated with images):

    The character sequence [us] stands for the scribal abbreviation mark no. 1234 of Cappelli's catalogue, [rum] is the expansion of R rotunda with crossed descender.

        Temp[us] edax re[rum]

    This sentence means 'Time is the devourer of things'. [...]

    Don't misunderstand, I fully support encoding of common scribal abbreviation marks (and if I understand they made it to 5.1?), but the discussion was about one-off characters; I do champion that they should be PUA encoded.

    *) I do believe that in the case of systematic use unattested characters can be argued to be included, e.g. in the case of SMALL CAPITAL LETTERs (not due to typographic use but due to historic and linguistic use) the two remaining missing characters should be added, possibly also the missing COMBINING LATIN SMALL LETTERs.

    Concerning the Geejay, I like the descriptive name proposal by Asmus Freytag (LATIN LETTER CAPITAL G WITH SUPERIMPOSED SMALL LETTER J) as do I concur with most of his views and arguments in the topic.

    Andreas, I like the 1940 dictionary you presented, though I'm still missing the exact signatures of both works on your homepage.
    Here one can see that the [Gj] is an independent cut. the [nj] is an interesting ligature, though it's questionable whether not to encode it n ZWJ j. Phonetic value alone is IMHO a no-argument for encoding, or else most digraphs (considered as single letters in several languages) should be encoded based on that.

    I believe it was Kent who noted an l-i ligature. I believe from looking at the picutes that it's actually a fraktur l and antiqua superscript i digraph instead. Though they might be ligated. Combining fraktur and antiqua glyphs to represent a single phoneme is not unprecedented, see the d[Gj] digraph for the dzh sound which features a fradtur d. Andreas, could you post some higher-resolution scans of the mentioned [li] please? I'd be thankful.

    A last question: is that an Latin small R with Ogonek on the scan saying (instance marked with <>): "_Lateinische_Buchstaben_ oder _Zeichen_ (o, [Gj], [nj], <r>) bezeichnen italienische Laute [...]"? It seems to be one on that scan, though not on the one below detailing the pronunciation. An argument being a not-simple letter would be its position not after the o ("Lateinische Buchstaben"), but rather being grouped with [Gj] and [nj] ("Zeichen").

    Could you please check the pronunciation guide part of the dictionary and possibly some phonetic transcriptions in the dictionary to see whether it's a special character or just some ink bleeding in the one noted instance?


    One more short note concerning the 1940 dictionary. Are there romanists among us? I suspect that nowadays when teaching standard Italian "agio" would be taught [adzho] and "bacio" [bacho]. This dictionary distinguishes two g{e,i} pronounced zh versus dzh and two c{e,i} pronounced sh vs. ch. Is that still contemporary practice for Standard Italian? (I'm not speaking about dialects).


    GMX FreeMail: 1 GB Postfach, 5 E-Mail-Adressen, 10 Free SMS.
    Alle Infos und kostenlose Anmeldung:

    This archive was generated by hypermail 2.1.5 : Sun Jan 06 2008 - 16:04:07 CST