RE: Why people still want to encode precomposed letters

From: philip chastney (philip_chastney@yahoo.com)
Date: Sun Nov 23 2008 - 15:01:44 CST

  • Next message: Asmus Freytag: "Re: Why people still want to encode precomposed letters"

    --- On Sun, 23/11/08, Don Osborn <dzo@bisharat.net> wrote:

    From: Don Osborn <dzo@bisharat.net>
    Subject: RE: Why people still want to encode precomposed letters
    To: "'Peter Constable'" <petercon@microsoft.com>, unicode@unicode.org, philip_chastney@yahoo.com
    Date: Sunday, 23 November, 2008, 1:01 PM

    A couple of quick questions. First, about how long would the list of
    combinations be?

     
     
     
    if we take 32-ish Latin characters, 24 Greek and 36-ish Cyrillic characters, and double that for upper and lower case, we have 144 potential base characters
     
    Combining Diacritical Marks (0300~036F) lists 112 characters
     
    the number of combinations never yet seen (false positives) will far outweigh the number of combinations requiring more than one mark, or a mark from another block (the false negatives), so our first Wild-Assed Guess (WAG) is a maximum of 16,000 combinations
     
    we can refine that figure
     
    Latin characters use about 40 marks, Greek perhaps half-a-dozen (if we count the cases where 2 marks are used) and Cyrillic about 12
     
    ( 32 × 40 ) + ( 24 × 6 ) + ( 32 × 12 )  =  1808  potential combinations per case, which gives us a tighter limit of 3,600 combinations
     
    how useful is that figure? well, a rough count of the preformed composites already defined in 6 Latin blocks is a little over 500, and certainly less than 600
     
    Greek and Coptic, and Greek Extended, contribute another 250
     
    Cyrillic, Supplement, Extended-A and Extended-B, contribute a further 90
     
    call that 900 preformed composites already specified
     
    if Unicode is a job half done, then the total list of known composites could grow to 1800 combinations
     
    add a little for Armenian, Georgian and Yiddish, and we can see the table is unlikely to require more than 2000 entries, with 3,600 entries as a worst case scenario
     
    my guess is that 1,200 would be enough, once Yoruba, Orok, &c, are included, but whatever  --  at least we’ve got a Rough Order of Magnitude on the size of the problem
     
    once a table like that becomes available, your average font designer will stick anchors on all possible base characters, and matching anchors on all likely markings, and import the table into his or her font, as an OpenType table
     
    the resultant display may be a little less than perfect, but the reader will see a recognisable mark+base form
     
    your average FontLab user is quite likely to have a similar table already set up, so that preformed composites can be generated automatically  --  what your average FontLab user needs now is a table that is complete, so far as present knowledge allows
     
    as to what columns this table should have, I would like to see an H-flag (for Historical) meaning something like “no known usage detected in the wild since the end of the nineteenth century”, or something like that, but that shouldn’t be allowed to cloud the fact that a table of known combinations would be an asset to font designers and users of minority languages, even without any ancillary information
     
    /phil

     



    This archive was generated by hypermail 2.1.5 : Sun Nov 23 2008 - 15:04:22 CST