RE: Why people still want to encode precomposed letters

From: philip chastney (philip_chastney@yahoo.com)
Date: Sun Nov 23 2008 - 15:01:44 CST

Next message: Asmus Freytag: "Re: Why people still want to encode precomposed letters"

Previous message: Hans Aberg: "Re: Why people still want to encode precomposed letters"
In reply to: Don Osborn: "RE: Why people still want to encode precomposed letters"
Next in thread: Karl Pentzlin: "Re: Why people still want to encode precomposed letters"
Reply: Karl Pentzlin: "Re: Why people still want to encode precomposed letters"
Reply: John Hudson: "Re: Why people still want to encode precomposed letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

--- On Sun, 23/11/08, Don Osborn <dzo@bisharat.net> wrote:

From: Don Osborn <dzo@bisharat.net>
Subject: RE: Why people still want to encode precomposed letters
To: "'Peter Constable'" <petercon@microsoft.com>, unicode@unicode.org, philip_chastney@yahoo.com
Date: Sunday, 23 November, 2008, 1:01 PM

A couple of quick questions. First, about how long would the list of
combinations be?

if we take 32-ish Latin characters, 24 Greek and 36-ish Cyrillic characters, and double that for upper and lower case, we have 144 potential base characters

Combining Diacritical Marks (0300~036F) lists 112 characters

the number of combinations never yet seen (false positives) will far outweigh the number of combinations requiring more than one mark, or a mark from another block (the false negatives), so our first Wild-Assed Guess (WAG) is a maximum of 16,000 combinations

we can refine that figure

Latin characters use about 40 marks, Greek perhaps half-a-dozen (if we count the cases where 2 marks are used) and Cyrillic about 12

( 32 × 40 ) + ( 24 × 6 ) + ( 32 × 12 ) = 1808 potential combinations per case, which gives us a tighter limit of 3,600 combinations

how useful is that figure? well, a rough count of the preformed composites already defined in 6 Latin blocks is a little over 500, and certainly less than 600

Greek and Coptic, and Greek Extended, contribute another 250

Cyrillic, Supplement, Extended-A and Extended-B, contribute a further 90

call that 900 preformed composites already specified

if Unicode is a job half done, then the total list of known composites could grow to 1800 combinations

add a little for Armenian, Georgian and Yiddish, and we can see the table is unlikely to require more than 2000 entries, with 3,600 entries as a worst case scenario

my guess is that 1,200 would be enough, once Yoruba, Orok, &c, are included, but whatever -- at least we’ve got a Rough Order of Magnitude on the size of the problem

once a table like that becomes available, your average font designer will stick anchors on all possible base characters, and matching anchors on all likely markings, and import the table into his or her font, as an OpenType table

the resultant display may be a little less than perfect, but the reader will see a recognisable mark+base form

your average FontLab user is quite likely to have a similar table already set up, so that preformed composites can be generated automatically -- what your average FontLab user needs now is a table that is complete, so far as present knowledge allows

as to what columns this table should have, I would like to see an H-flag (for Historical) meaning something like “no known usage detected in the wild since the end of the nineteenth century”, or something like that, but that shouldn’t be allowed to cloud the fact that a table of known combinations would be an asset to font designers and users of minority languages, even without any ancillary information

/phil

Next message: Asmus Freytag: "Re: Why people still want to encode precomposed letters"
Previous message: Hans Aberg: "Re: Why people still want to encode precomposed letters"
In reply to: Don Osborn: "RE: Why people still want to encode precomposed letters"
Next in thread: Karl Pentzlin: "Re: Why people still want to encode precomposed letters"
Reply: Karl Pentzlin: "Re: Why people still want to encode precomposed letters"
Reply: John Hudson: "Re: Why people still want to encode precomposed letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Nov 23 2008 - 15:04:22 CST