Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)

From: Philippe Verdy (
Date: Fri Jul 18 2003 - 06:16:42 EDT

  • Next message: Michael Everson: "Re: About the European MES-2 subset (was: PUA Audio Description, Subtitle, Signing)"

    On Friday, July 18, 2003 7:36 AM, Michael Everson <> wrote:

    > At 00:57 +0200 2003-07-18, Philippe Verdy wrote:
    > > Why is row 03 so resticted? Shouldn't it include those accents and
    > > diacritics that are used by other characters once canonically
    > > decomposed? Or does it imply that MES-2 is only supposed to use
    > > strings if NFC form?
    > >
    > > Also, is this list under full closure with existing character
    > > properties, like NFKD decompositions, and case mappings?
    > The MES-2 is what it is, and was developed at the time when it was.
    > It is thought to be a minumum requirement for European requirements,
    > and is certainly a lot better than that old Adobe glyph list that was
    > supported earlier on. It doesn't depend on very smart fonts.
    > Personally I prefer the Multilingual European Subset.

    Is there some work at CEN to align its MES-2 subset into a
    revized (MES-2.1 ???) which not only takes into consideration the
    ISO10646 reference but also its Unicode properties to make this set
    self-closed, and actually implementable, at least with NFC closure
    and case-mappings closure?

    Support for NFKC closure should then be added in a next step, which
    could optionally specify support for the corresponding decompositions
    (but this would include combining characters, and would extend the
    number of precomposed characters in NFC form to include in the

    I don't think it's up to Unicode to do this work, but CEN should be
    contacted to perform this job, or some vendor or open-sourcers
    may have done it and published it.

    I still note that modern Hebrew and Arabic are excluded from MES-2,
    as they are not used in any official language in the European Union
    or EFTA, or future EU candidates. But They are certainly of great
    interest for countries with which the EU is a major partner, and which
    are using these scripts. In some future, it would be needed to include
    support for modern Georgian (a subset of U+10A0..U+10FF), and modern
    Armenian (a subset of U+0530..U+058F), as well as some characters
    from Cyrillic Supplementary (in U+0500..U+052F).

    On the opposite, I don't understand why MES-2 included characters
    in row U+25xx (Box Drawing, Block Elements, Geometric Shapes),
    which are not strictly needed for text purpose (notably legal publications
    of the E.U., which should better use markup systems), and the two
    Alphabetic Presentation Forms U+FB01..U+FB02 (<fi> and <fl>
    ligatures) which are really unneeded, even for legal purposes, or they
    should have been coherent and included <ff>, <ffi>, <ffl> ligatures...

    I suppose that this may come from widely used legacy encodings in
    some EU+EFTA+European Council countries, but CEN should have
    avoided them (they could still be selected by font renderers, if available
    in fonts).

    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.

    This archive was generated by hypermail 2.1.5 : Fri Jul 18 2003 - 07:00:52 EDT