Re: About the European MES-2 subset

From: Philippe Verdy (
Date: Fri Jul 18 2003 - 09:21:10 EDT

  • Next message: Peter Kirk: "Re: About the European MES-2 subset"

    On Friday, July 18, 2003 1:13 PM, Peter Kirk <> wrote:

    > On 18/07/2003 03:16, Philippe Verdy wrote:
    > > I still note that modern Hebrew and Arabic are excluded from MES-2,
    > > as they are not used in any official language in the European Union
    > > or EFTA, or future EU candidates. ...
    > >
    > But they are used in official publications within the EU, those
    > targeted
    > at minority communities. But then so are south Asian and east Asian
    > scripts.

    But for these Asian languages, I think it's best to have fonts designed to
    handle correctly their corresponding scripts, instead of a giant font poorly
    hinted for readability at small sizes, and without support of common

    Arabic, Hebrew and Brahmic scripts should better be supported by their
    own fonts, rather than partially (for example the inclusion of Brahmic
    digits only in Arial Unicode MS was an error, in my opinion, and Microsoft
    should have better provided separate fonts for these Brahmic scripts, rather
    than specifying that its fonts support these scripts).

    > > ... But They are certainly of great
    > > interest for countries with which the EU is a major partner, and
    > > which
    > > are using these scripts. In some future, it would be needed to
    > > include support for modern Georgian (a subset of U+10A0..U+10FF),
    > > and modern Armenian (a subset of U+0530..U+058F), as well as some
    > > characters
    > > from Cyrillic Supplementary (in U+0500..U+052F).

    For the case of Armenian and Georgian Mkedruli, they do not seem complex
    to add in a font.

    > If this subset is to be enlarged very much, and to require complex
    > script rendering etc for its implementation, surely there is little
    > point in specifying anything less than the improper (in the
    > mathematical sense!) subset which Ken mentioned, i.e. the whole of
    > Unicode.

    I agree with this point. But this is not an excuse to not implement and
    support at least the NFC and case mapping closures in a decent font
    for any script, even if the script is reduced to letters used in the modern

    But some optional ligatures not strictly needed for a set of written
    modern languages may strictly be not needed if the font or renderer
    supports correct fallback decompositions (for example with <fi>, <fl>,
    <ffi>, <ffl>). What is important here is the legality of the printed text,
    so that no confusion is possible for a text written in any language.

    One good source of such characters needed for languages can be
    found in the LDML database (notably the ICU section
    which is the most complete collection), which contain definitions of
    <examplarCharacters> for each supported language (but there may
    exist some omissions). One regret: some characters are used and
    examplar but not mandatory to support a language and they should
    be listed separately, as well as rare characters if they are used only
    in proper names or geographical names or translitterated foreign
    words which can often be written with a the common letters with a
    phonetic approach.

    An example is: Norsk "Bokmål", most often transcripted to: norvégien
    "bokmal" or "bokmâl" in French (where the circumflex is used both as
    a way to specify an open and/or lengthened vowel), or translated to:
    norvégien "classique" (by opposition to: norvégien "réformé", ou
    "nouveau" norvégien).

    So <examplarCharacters> in a language are a good indication to
    indicate the needed characters for a language, even if an "official"
    transliteration rule is used to translate imported foreign words with more

    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.

    This archive was generated by hypermail 2.1.5 : Fri Jul 18 2003 - 10:31:11 EDT