Re: Rare extinct latin letters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jun 03 2003 - 08:25:46 EDT

  • Next message: rseaman@lexmark.com: "Re: International Font to be Used"

    From: <Peter_Constable@sil.org>
    > Philippe Verdy wrote on 05/30/2003 09:42:53 AM:
    >
    > > If this is not enough, may be we could create only a new diacritic
    > > for the long leg attached on right
    >
    > I think it's a bad idea to encode combining marks that do not combine
    > productively but are only used with a small set of base characters, and
    > that attach, meaning that special-case outlines are likely to be needed.

    How do you consider the existing "hook" diacritic ? Attached diacritics are already encoded. We can use them as a good fallback system in their context, so that the text will e mostly readable but users that are not aware of that specific usage.

    The exact glyph can be described in a definition or appendix to replace the default glyph that would be generated by (for example) L+HOOK; This allows a font to be created to correcly display the L+HOOK as a L-MOLL if needed, but this still facilitates the interchange, without making the text completely unreadable for those that don't have this font.

    I don't think it creates a semantic issue, because such text using the special alphabet does not attach any semantic o L+HOOK, so it can be safely interpreted (in context) as meaning L-MOLL.

    For me the displayed documents are describing new glyphs, but not really new characters, as it clearly reuses an existing script, with a very strong relation with the "normal" basic Latin script used in traditional French.

    The "special-case outlines" fit into the category of glyphs, i.e. specific fonts, and the original abstraction of the author is preserved because a base letter plus a diacritic is considered in all French texts (and the exposed variants) as a unique abstract character (or grapheme cluster?). Using such diacritic will still work correctly with all other Unicode algorithms, and the reader will not make fales interpretations as this method also creates interesting easy fallbacks if the combination L+HOOK cannot be rendered: in the exposed documents, HOOK by itself as no meaning, but only the composed sequence.

    This also adds good collation orders for this special-case script (which is an invented notation but not really a new abstract script), and other possible transforms (such as case-folding). In this context, L+LOOK would clearly mean the L-MOLL semi-consonnant, A+HOOK would clearly mean the AU vowel, N+HOOK would clearly mean N-MOLL, and the default rendering in non-aware applications would still not break the interpretation of text as it preserves the grapheme-cluster boundaries of the original text.

    All that is required is an agreement between linguists that study such Old French texts and want to interchange their respective work. And they can build a common font that will include the proper L-MOLL glyph for the L+HOOK abstraction.

    The way I see abstract characters defined in Unicode, is that they designate an agreement between users of Unicode to use these standardized codepoint sequences to correctly encode text sharing common semantics in a given language and script pair. The two Old French text is rare enough so that such agreements between the most influent linguists that study these texts can be found. I's up to the linguists to consider if a mapping to an existing sequence of standardized codepoints would not be more beneficial than creating a new codepoint that may be simply too rarely supported.

    For me, I would prefer to be able to sutdy such text by seeing a L-HOOK glyph instead of a L-MOLL glyph if I don't have a font that matches precisely the glyph design of the original text. The "learning curve" would be extremely short, as this default glyph is still much similar to what the original text displayed, and because there does not seem to be conflicts between this "special-use" script variant and the traditional one (so mixing the traditional Old French script, or the current French script with this special case script would not cause interpretation and semantic problems)

    On the opposite, encoding with variant selectors or new special codepoints would cause much more problems and would not ease the interchange (I would hate to see a default square glyph of all these new codepoints, and would even would not appreciate that the defaultrendering of a variant selector would be the base letter without that variant represented).

    With some searches, you could easily find some related publications of this old text using such diacritics, because of reproduction costs at that time where the design of new metal fonts was too costly, and even the author may have authorized such compromize to keep their text "intact". (Such variation would not be worse than what could be found in private handwritten mail exchanges between the author and the publisher, or other scholars at the same epoch this script was created).

    In fact, the facsimile text shown here exhibit some considerable variation of the glyphs, and this really demonstrate that these characters were quite hard to reproduce exactly due to technical constraints or ability of the font workers employed to create the reproduction plates for that publication. In these old times, reproduction costs of text were quite expensive for any author, and many compromizes were needed when publishing books, and I think that you'll find much differences between what the publisher did and what the author intended (it would require studying their handwritten mails or commercial contracts, to see what was initially intended, and how other scholars considered this creation and how they also interpreted it for their own studies or use).

    So when I suggested that a new diacritic could be encoded for a "long leg", that was not my prefered choice. Using other existing diacritics (such as a combining hook already encoded in Unicode) seems a more reasonnable choice that falls within the limits of what authors and linguists already accepted to do at their epoch due to money and technical constraints.



    This archive was generated by hypermail 2.1.5 : Tue Jun 03 2003 - 09:24:33 EDT