Dialects and orthographies in BCP 47 (was: Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters)

From: Doug Ewell (doug@ewellic.org)
Date: Wed Aug 04 2010 - 14:29:39 CDT

Next message: Karl Pentzlin: "Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters"

Previous message: verdy_p: "Re: Draft Proposal to add Variation Se A quences for Latin and Cyrillic letters"
Next in thread: verdy_p: "=?UTF-8?Q?re:_Dialects_and_orthographies_in_BCP_47_(was:_Re:_Draft_Proposal=D=A_to_add_Variation_Sequences_for_Latin_and_Cyrillic_letters)?="
Reply: verdy_p: "=?UTF-8?Q?re:_Dialects_and_orthographies_in_BCP_47_(was:_Re:_Draft_Proposal=D=A_to_add_Variation_Sequences_for_Latin_and_Cyrillic_letters)?="
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

verdy_p <verdy underscore p at wanadoo dot fr> wrote:

> Really, "Hans", "Hant", "Latf", "Latg" could have been avoided in ISO 15924, if orthographic variants of the same
> languages had been encoded in the IANA database for BCP 47, independantly of the effective font style.

Actually it was the opposite; the ability to use standardized ISO 15924
code elements to express concepts like "Simplified Han" was one of the
driving forces behind RFC 4646 and its shift in focus from whole tags to
subtags.

In any case, the bibliographers and others who use ISO 15924 but not BCP
47 might need to make these distinctions as well.

> But for now there's still no formal model for encoding language dialects, so BCP 47 language tags still need to use
> tags for ISO 3166-1 region codes and for the script variant, when it should just qualify the generic script code (or
> it could even drop this ISO 15924 code if there was a formal code for the dialect written in a specific orthography:
> we would also deprecate "Jpan", "Hrkt" in ISO 15924).

There is no "formal model" in the sense of a standard N-letter subtag
for dialects, because the concept of a dialect is too open-ended and
unsystematic. The word means different things to different people.
What may be a dialect to one person might be a full-blown National
Language to another, or just a funny accent to a third.

BCP 47 tags never *need* to use either the region subtag or the script
subtag, unless they are necessary to convey the intended meaning. A tag
like "ja-Jpan-JP" is almost never needed, because almost all written
Japanese is "using the Japanese writing system" ('Jpan') and "as used in
Japan" ('JP').

I'm not sure what dialect is being posited here that would make the
difference between having to specify a script subtag and not having to.

> Orthographic variants would include also:
> - the various romanization systems (for example Pinyin) and phonetic transcriptions (IPA phonetic, simplified IPA
> phonology),

'pinyin', 'fonipa'

> - the simplified orthographies (e.g. orthographic reforms in French and German),

'1606nict', '1694acad', '1901', '1996'

> - and some other minor variants (like the vertical presentation for East-Asian scripts, or Boustrophedon
> presentation for Ancient Greek, if this alters the orientation of characters that had to be encoded differently, and
> the default mirroring properties are not applicable to the encoded characters in the basic language).
>
> For now these dialectal/orthographic variants of written languages can be registered in the IANA database for BCP
> 47, using codes with at least 5 letters (or with at least 4 letters or digits if there's at least one digit),

A 4-character variant subtag must *begin* with a digit.

> but
> ideally the dialectal variant should be encoded as a tag BEFORE the orthographic variant.

Why is this important?

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Next message: Karl Pentzlin: "Re: Draft Proposal to add Variation Sequences for Latin and Cyrillic letters"
Previous message: verdy_p: "Re: Draft Proposal to add Variation Se A quences for Latin and Cyrillic letters"
Next in thread: verdy_p: "=?UTF-8?Q?re:_Dialects_and_orthographies_in_BCP_47_(was:_Re:_Draft_Proposal=D=A_to_add_Variation_Sequences_for_Latin_and_Cyrillic_letters)?="
Reply: verdy_p: "=?UTF-8?Q?re:_Dialects_and_orthographies_in_BCP_47_(was:_Re:_Draft_Proposal=D=A_to_add_Variation_Sequences_for_Latin_and_Cyrillic_letters)?="
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 04 2010 - 14:33:16 CDT