Re: Looking for code ranges on specific languages.

From: Jonathan Woodburn (jonathan@woodburn.cc)
Date: Thu Jul 17 2008 - 16:05:59 CDT

  • Next message: Mark Davis: "Re: Looking for code ranges on specific languages."

    > David Starner wrote: First, why are you creating a new font for this
    > internal projects that covers Chinese? That can't be economically
    > feasible. A quick and dirty font for English is possible, but
    > Chinese is a bit larger. [...] you shouldn't need to worry about
    > individual Latin-using languages; just toss in MES-2, which will
    > cover every major Latin/Greek/Cyrillic using language in the world,
    > at the cost of a measly 1000 characters.

    Admittedly, Chinese is a huge character set, however, the font is
    still aimed at a low memory footprint. However, I'm getting the
    impression that perhaps my understanding of Unicode is misinformed (or
    simply uninformed). Is every character not found in a common table
    for every language (i.e. Latin characters + foreign language accents +
    Cyrillic + chinese, etc...)? If so, one font in one format should
    suffice the composition of any document in any of the purposed
    languages, no?

    > Mark Davis wrote: [...] take a look at http://www.unicode.org/cldr/data/charts/by_type/misc.exemplarCharacters.html
    > (for Latin a wide screen helps ;-)

    Thank you. I've been analyzing the Latin table and as I'm
    understanding it, the language codes are at the top of the table in
    ISO 639-1 format followed by the character found in that language on
    the left, correct? If this is an exhaustive list, it will be a little
    tedious to read the HTML Source, but will certainly work. :)

    > Stephane Bortzmeyer wrote: You can find a list for the French
    > language here (the article is in French but the table - in the RFC
    > 4290 format - is in English): http://www.bortzmeyer.org/4290.html

    Many thanks. I believe French now reads as follows: (002D ,
    0030-0039, 0061-007A, 0153, 00E0, 00E2, 00E6 - 00E9, 00EA, 00EB, 00EE,
    00EF, 00F4, 00F9, 00FB, 00FC, 00FF)

    > Erkki I. Kolehmainen wrote: MES-2 is part of a CEN Workshop
    > Agreement (CWA 13873, IT - Multilingual European Subsets in ISO/IEC
    > 10646-1), never meant to become a full blown standard per se,
    > available at http://www.cen.eu/cenorm/sectors/sectors/isss/cen+workshop+agreements/multilingual+eur+subsets.asp
    > . In the UCS standard ISO/IEC 10646, it is defined as Collection
    > 282. IMHO, MES-2 is imperfect and somewhat outdated, but WGL4 is
    > much more so.

    These standards are simply a smaller collection of code ranges from
    Unicode, yes? If not, does that imply a custom multilingual font
    which uses select characters for specific languages is not possible?

    On an contributory note, I've found this site (http://www.eki.ee/
    letter/) which lists what special characters are needed in addition to
    the basic latin script to display a given language. This seems to hit
    the issue on the head to a great degree, but a couple questions remain
    (introduced by the prior feedback):

    1. Are all characters for every language found in a single Unicode
    definition so that U+XXXX can express any character?
    2. Would it be necessary to create individual fonts for particular
    (non-coexisting) languages?

    I hope my questions don't confuse the issue, as I do appreciate the
    feedback.

    Cheers,
    Jonathan



    This archive was generated by hypermail 2.1.5 : Thu Jul 17 2008 - 16:10:17 CDT