old Latin chars (was RE: Acceptable alembic…)

From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Sun Jan 06 2008 - 06:06:24 CST

  • Next message: Andr Szabolcs Szelp: "additional combining small latin letter"

     
    Andreas Stötzner wrote:
    ...
    > medieval/early-printing ligatures and abbreviations. Extensive research
    > has been done on it in the past few years, see e.g.
    > http://gandalf.aksis.uib.no/mufi/ ; see the Latin-Ext.-D block of the
    > UCS.

    The document you refered to there in turn refers to
    http://www.mufi.info/specs/MUFI-Alphabetic-2-0.pdf.
    That document unfortunately does not seem up to speed with Unicode,
    even though it (inaccurately) states "Compliant with the Unicode Standard
    version 5.0".

    For instance, it allocates to the PUA characters that are already encoded
    in non-PUA, albeit as combining sequences rather than single characters.
    Just to mention a few (there are MANY more in that document):

    LATIN SMALL LETTER A WITH WITH OGONEK AND ACUTE (ignoring the double WITH...)
    can and should be represented by
    0105;LATIN SMALL LETTER A WITH OGONEK followed by 0301;COMBINING ACUTE ACCENT
    (or a sequence canonically equivalent to that).

    LATIN SMALL LETTER A WITH DOUBLE ACUTE can and should be represented by
    0061;LATIN SMALL LETTER A followed by 030B;COMBINING DOUBLE ACUTE ACCENT.

    (etc. for quite a few more accented letters).

    In some cases there aren't even combining characters involved: e.g.
    LATIN SMALL LIGATURE F O WITH DIAERESIS
    This should be represented simpy by LATIN SMALL LETTER F followed
    by LATIN SMALL LETTER O WITH DIAERESIS (or a sequence canonically
    equivalent to that). The ligature should be formed by the font from that
    sequence. The formation should be by default if the font is suitable for
    texts with "fö" in them, and typographically there would be an overlap
    (or extra spacing) if there was no ligature. (Note that other ligatures
    aren't like that, but are akin to (e.g.) the oe ligature, like for
    instance tha aa ligature. These should be represented by characters
    of their own.)

    They also change the formal name of some characters (for various reasons),
    e.g. LATIN ABBREVIATION SIGN SMALL ET for TIRONIAN SIGN ET, which is not
    helpful. Even if the name is "wrong" (which does not seem to be the case
    here) the formal name is used for reference, not to be changed.

                    /kent k



    This archive was generated by hypermail 2.1.5 : Sun Jan 06 2008 - 06:10:44 CST