Re: Looking for a standard on historical countries from Steven R. Loomis on 2014-11-01 (Unicode Mail List Archive)

From: Steven R. Loomis <srl_at_icu-project.org>
Date: Sat, 01 Nov 2014 21:57:06 -0700

On 11/1/2014 8:10 AM, Richard Wordingham wrote:
> On Fri, 31 Oct 2014 20:43:19 +0100
> Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>
>> How is ths related to Unicode ?
> One possibility is though the Regional Indicators, but they are defined
> by the unstable ISO 3166-1 alpha-2 codes.
It was noted as "off topic". It.s relevant because CLDR is relevant.
>> May be it's associated to CLDR for former regional classifcation of
>> languages, but I doubt this will ever create any standardization for
>> historic data that should remain as is without changes in their old
>> sources for which there are no more any active maintainers, just
>> interested people (basically historians that may comment about them
>> the way they want or could invent their new terminology for analysts
>> and archivists).
> A lot of useful historic information is missing from CLDR. For example,
> I believe line-breaking and word-boundary rules are completely missing
> for 'Sumero-Akkadian' Cuneiform writing systems. The rules were not
> uniform. On the other hand, an entry for the Assyrian for 'English' as
> used in the Assyrian homeland would be meaningless.
A lot of speculation happened some time back with the assumption that
CLDR would a priori reject historic language contributions such as Latin
(it wouldn't). Zero bugs were even filed, let alone any data submitted
for Latin. Besides Sumero-Akkadian, we could probably add break rules
for, say, Oromo, Slovak, Spanish, and Dutch (
http://unicode.org/cldr/trac/ticket/2992 ).

> The precise territory covered by a country is not useful within the
> Unicode domains, nor are debates about independence, nor whether tribute
> was paid regularly. In general, a more useful division may be by date,
> but that is barely covered by a system designed for present-day
> languages.
Sure. It would need to be a differnet namespace from ISO-3166 and
probably IETF BCP 47.

I wonder if you could use Linked Open Data sets (come hear about it
Monday at IUC38!) to look for ontology/Country that doesn't have a 3166
code, something like the following. You could extract start/end date,
successor country, etc.

<http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E+%0D%0APREFIX+ontology%3A+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2F%3E+%0D%0Aselect+distinct+%3FcountryUri++%0D%0Awhere+{+%3FcountryUri++rdf%3Atype+ontology%3ACountry+.+}+&format=text%2Fhtml&timeout=30000&debug=on>

> If this thread is of to be of any immediate use, what is the intended
> use of the information?
The original post made it sound like it was related to book publishing.
"all countries where there was a printing press would be optimal coverage".

-s

-- 
IBMer but all opinions are mine.      // GPG: 9731166CD8E23A83BEE7C6D3ACA5DBE1FD8FABF1
https://www.ohloh.net/accounts/srl295 // https://ssl.icu-project.org/trac/wiki/Srl

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode

application/pgp-signature attachment: OpenPGP digital signature

Received on Sun Nov 02 2014 - 00:00:37 CDT

This archive was generated by hypermail 2.2.0 : Sun Nov 02 2014 - 00:00:38 CDT