Date: Mon, 24 May 2010 18:21:22 -0700
Subject: UTC Agenda Item: script metadata
From: Mark Davis


We have some useful data associated with scripts, such as the group (corresponding to the Chart organization on http://unicode.org/charts/#scripts). I was adding the 3 new scripts to that list, and had a couple of questions/comments. Here is an overview of the data: http://goo.gl/ofsA. The first item is for discussion by the UTC. The second two are just requests for information from this mailing list, unconnected to any UTC issue or proposal.

1. For comparison, I added a WR (web rank) column to the data, which is the ranked number of characters of that script found on the web (in our sampling). In that column, ? means that the data is much too small to be statistically significant, while ?? is on the edge, but still very small. On that basis, I'd like to discuss in the UTC whether for the UAX31 identifier recommendations, we move Syriac, Bopomofo, Canadian Aboriginal, Mongolian, Tifinagh, and Yi to Limited_Use. Note that "Limited_Use" doesn't mean that people should exclude them, it just means that they are in uncommon enough use that they should think carefully about whether to use them in identifiers where security is at issue. Note also that based on the data, it is actually bumping up Syriac from the "Candidate for Exclusion" category.

2. It would be useful to have a sample character for each script, one that matched one of the characters used in Apple's Last Resort font (which is also available on the Unicode site). Is such a list available anywhere? I threw in a quick generated list in the spreadsheet, but it would be nice to match the LRF.

3. Knowing whether a script requires shaping for minimal usage is useful. I put in some draft values in that chart, but feedback on which other scripts require shaping would be appreciated.