RE: Dataset for all ISO639 code sorted by country/territory?

From: Doug Ewell <doug_at_ewellic.org>
Date: Thu, 10 Nov 2016 10:56:58 -0700

Mats Blakstad wrote:

> For myself I was not actually considering the amount of speakers in
> each country, but to map languages with countries/territories where
> the language originated or have been spoken traditionally.

And that is where I think you'll have disagreement on the details.

> So I guess what matters is which language people mostly expect to find
> under the country/territory.

Yep, that's the challenge.

> Would it be possible to extend this dataset to all languages and start
> build an open source data set for language-territory mapping?
> http://www.unicode.org/cldr/charts/latest/supplemental/language_territory_information.html

That's a good question for the CLDR folks, who have their own mailing
list.

Keep in mind that the CLDR table documents 675 of the world's best-known
languages, counting variants such as three different orthographies of
Uzbek. While anything is possible, extending this to "all languages,"
e.g. the other 6,300 lesser-known living languages, might require a bit
of time and money.

There is also a resource in the "UDHR in Unicode" project that might be
worth investigating, though it too is an imperfect match with what you
seem to be looking for.

--
Doug Ewell | Thornton, CO, US | ewellic.org
Received on Thu Nov 10 2016 - 11:58:45 CST

This archive was generated by hypermail 2.2.0 : Thu Nov 10 2016 - 11:58:46 CST