Re: Searching data: map countries to scripts

From: Manuel Strehl <>
Date: Mon, 20 Aug 2012 09:04:51 +0200

Thanks for the answer.

It's clear to me, that I could map "Hana" and "Kata" to "US" just for
the sake of having a Japanese Minority in the states. Of course, the
mapping must be sensible in a way, that is, explain, how the mapping
is done. I'd be fine, I guess, with having all official languages and
important historic ones respected (disputable cases, where larger
minority languages are suppressed, may exist of course).

Basically I'm looking for an n:m chart with ISO-639 on the left, and
ISO-15924 on the right. When the data itself is annotated with "used
by 0.2% of population" or "historic" that's all the better, because
then I could define my own cut-off limit. When there is only a prose
explanation of how the data was accumulated, I could judge, if the set
suits the task.

When there is no such data set whatsoever, I'll be off to scrape
Wikipedia again, but that is, as I've written, not an effective or
particularly error-free approach.


2012/8/20 Asmus Freytag <>:
> On 8/19/2012 4:05 PM, Manuel Strehl wrote:
> Hello,
> I'm looking for a data source, that maps countries to scripts used in
> them. The target application is a visualization in the context of my
> site, namely
> At the moment I've extracted the prefered scripts from CLDR (e.g., Cyrl
> for Russia, Latn for Germany and so on). Then I've added some historic
> scripts by looking at corresponding Wikipedia articles and did some
> manual updating. However, this yields a not really satisfactory result.
> For example, Russia has only Cyrl associated, while, as far as I can
> tell, at least Latn and Arab should also be mentioned, also perhaps some
> historic scripts.
> I'd appreciate any pointers if and where I could find data sets that aid
> me in completing and error-proofing this mapping.
> Cheers,
> Manuel
> Heck, my utility bill in the US has Thai and Chinese characters (for the
> fine print, not the statement itself). There's one more script, could be
> Cyrillic, don't have one in front of me right now. In some areas of town
> you'll find a mixture of scripts on shop signs as well.
> The point it's easy to identify a majority script, but to get an accurate
> handle on "other" scripts is going to be tricky, if not impossible. And it
> all depends on your arbitrary decision of what other scripts to include and
> on what basis.
> A./
Received on Mon Aug 20 2012 - 02:09:35 CDT

This archive was generated by hypermail 2.2.0 : Mon Aug 20 2012 - 02:09:45 CDT