    I am impressed with the data collected but have problems with the structure and some of the actual data values.

    For example if I want to handle date/time data I need time zone info. I may also need country information to parse and format the date as well and language info for things like month and day of week names.

    To me mixing country dependant and sub languages dependent data together makes no sense. I have this problem with ICU as well.

    Language should be: language, script, country sub language and variant.

    The country values should be stored differently. It is a vary bad idea to replicate the same country values in every locale. It is in violation of the principles of normalization besides some variants apply to the country not the languages such as the EURO variant.

    The common way locales are passed is with strings. Thus if we use lowercase country to specify a sub language as distinct from country we can have a locale like: "es_mx_US#America/Los_Angeles". In the case of "en_US#America/Los_Angeles" it would be the same as: "en_us_US#America/Los_Angeles".

    If you want to maintain compatibility with systems like Windows with LCIDs you can use separate LCIDs for language and country values if you have a mixed environment like "es_mx_US".

    "es_mx_US#America/Los_Angeles" is easy to implement in that if the value is language dependent you look under "es_mx". Country dependent data is under "US" and the time zone is "America/Los_Angeles". This greatly reduces not only normalcy problems with the locale data but user databases and provides automatic support for locale combinations with much less effort.

    The short time zones should be common only to the specific country that uses them. Even for the US locale they are a mess. Both America/Anchorage and America/Halifax use "AST" (Alaska Standard Time/Atlantic Standard Time".


