Re: Language Tag Registrations

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat May 31 2003 - 18:42:39 EDT


From: "Marion Gunn" <mgunn@egt.ie>
> >What, then, is the code for the English of 'Northern Ireland'?
> >(GB+NI=UK.)
>
> Since Ulster, as "IANA" <iana@iana.org> knows, is divided by an
> international border, is the logical reply 'encode Ulster English
> separately for each side of the border'? Is Basque separately 'lang-tagged'
> for ES and FR?

Don't forget Catalan as well, spoken in Spain, Andorra, France and up to the North of Italy.
It's hard to place a national border for a language (look at English, French and Spanish/Castilliano).

So I have real doubts that English spoken in the Irish part of Ulster is specific and distinct from both English spoken from other areas of Ireland or in Britain (England, Wales, Scotland). Language variants are not distinct because of a national border ut because a long history of separation of peoples and atachment of peoples to an origin culture in times of political conflicts or repressions.

If you wanted to designate the British part of Ulster in Northern Ireland, It should be coded as a region name within the country, by appending the region code to the country code, i.e. GBNI (if NI is a region code). The Irish part of Ulster would be IEUL (if UL is a region code).

Then English in each area can be correctly labelled: "en-IE" is general English as spoken in the whole Ireland. "en-IEUL" is for the Irish part of Ulster only. "en-GB" is for the whole United Kingdom of Great Britain and Northern Ireland, "en-GBNI" would be the specific variant for Northern Ireland, "en-GBEN" for England only, "en-GBSC" for Scotland, "en-GBWL" for Wales.

Note that I did not verify any codes for the regions in countries: the relevant codes come from official national codes. Howeer there are possible sources of confusion: the main ISO-639-2 codes for France are those coming from the 2-digits or 3-digits numeric department codes (as used for postal codes or in identification of vehicles), despite departments are grouped in administrative regions (so FR75 designates the department of Paris city which is part of the region named Ile-de-France, generally coded IDF; so FRIDF would designate the whole region which also includes other sourrounding departments such as FR92).

I am not sure why this discussion goes into the Unicode list. This should be discussed in forums or newsgroups of the language coding working groups. All wha is related in Unicode is the already existing LANGUAGE TAG characters. They are just used as characters needed for language tagging but Unicode provides no semantic of these codes and let readers refer to other ISO standards for language codes and country/region codes.



This archive was generated by hypermail 2.1.5 : Sat May 31 2003 - 19:29:21 EDT