Re: Looking for a standard on historical countries

From: Steven R. Loomis <>
Date: Sat, 01 Nov 2014 21:57:06 -0700

On 11/1/2014 8:10 AM, Richard Wordingham wrote:
> On Fri, 31 Oct 2014 20:43:19 +0100
> Philippe Verdy <> wrote:
>> How is ths related to Unicode ?
> One possibility is though the Regional Indicators, but they are defined
> by the unstable ISO 3166-1 alpha-2 codes.
It was noted as "off topic". It.s relevant because CLDR is relevant.
>> May be it's associated to CLDR for former regional classifcation of
>> languages, but I doubt this will ever create any standardization for
>> historic data that should remain as is without changes in their old
>> sources for which there are no more any active maintainers, just
>> interested people (basically historians that may comment about them
>> the way they want or could invent their new terminology for analysts
>> and archivists).
> A lot of useful historic information is missing from CLDR. For example,
> I believe line-breaking and word-boundary rules are completely missing
> for 'Sumero-Akkadian' Cuneiform writing systems. The rules were not
> uniform. On the other hand, an entry for the Assyrian for 'English' as
> used in the Assyrian homeland would be meaningless.
A lot of speculation happened some time back with the assumption that
CLDR would a priori reject historic language contributions such as Latin
(it wouldn't). Zero bugs were even filed, let alone any data submitted
for Latin. Besides Sumero-Akkadian, we could probably add break rules
for, say, Oromo, Slovak, Spanish, and Dutch ( ).

> The precise territory covered by a country is not useful within the
> Unicode domains, nor are debates about independence, nor whether tribute
> was paid regularly. In general, a more useful division may be by date,
> but that is barely covered by a system designed for present-day
> languages.
Sure. It would need to be a differnet namespace from ISO-3166 and
probably IETF BCP 47.

I wonder if you could use Linked Open Data sets (come hear about it
Monday at IUC38!) to look for ontology/Country that doesn't have a 3166
code, something like the following. You could extract start/end date,
successor country, etc.


> If this thread is of to be of any immediate use, what is the intended
> use of the information?
The original post made it sound like it was related to book publishing.
"all countries where there was a printing press would be optimal coverage".


IBMer but all opinions are mine.      // GPG: 9731166CD8E23A83BEE7C6D3ACA5DBE1FD8FABF1 // 

Unicode mailing list

Received on Sun Nov 02 2014 - 00:00:37 CDT

This archive was generated by hypermail 2.2.0 : Sun Nov 02 2014 - 00:00:38 CDT