Mark,
I never submitted this as a proposal to Unicode before, but if I were 
to, I would suggest the language id scheme that I helped invent when I 
worked for MS. [It is documented in Nadine Kano's book, as well as in 
the header files for any Win32 development environment.]
The proposed language id is a numeric id, packed into a single short 
are 10 bits of primary language and 6 bits (the high bits) of secondary 
language id. The rule is that matching primary language id means a 
language family where each language can be freely substituted for 
another from the same family in user interface, documentation or 
helpfiles, yet still be understandable. 
The 10th and 16th bit designate an ID as 'user defined'. This allows 
researchers or vendors to define IDs for obscure dialects or other 
linguistic variations of languages, or to define IDs for rare or 
historic languages while preserving the substitution abilities and 
allowing data thus tagged to coexist with data using 'standard' (i.e. 
predefined) tags.
Since the tags thus fit into 16-bits, one can play all sorts of games
with how to insert them into a stream of Unicodes. For a pseudo 
plain-text approach you could insert ESC <xxx> <yyy> where <xxx> is a
code that designates that this is a language id escape and <yyy> can
immediately be the language id.
[Other suggestions I have heard use the private use space, typically by
reserving 2 sets of 256 codes each of which carries one byte of the 
language id in its lower byte. These shave some string length at a cost
of splitting the ids and risking overlap with other uses of P.U. Area.]
Another advantage of the 16-bit key is that it is conveniently useable 
as a numeric constant in an API call, without padding or pointer 
dereferencing as would be the case for strings of 3 letter 
abbreviations or similar schemes.
To summarize: Any proposal needs to address these issues
- how the ID is designed (numeric, string, etc.)
- how one can tell from the id that 2 languages are substitutable
- how the ID is incorportated into a data stream (default protocol)
- suggested initial assignments of ID values 
A./
You wrote: 
>
>We are interested in any previous proposals to the Unicode Technical
>Committee with regard to language identifiers.
>
>If you can provide a copy of any of these types of proposals, we would
>be grateful.
>
>We aren't looking for a language identifier approach, we are checking
>for previous proposals that might overlap or encompass one we might
>submit.
>-----------------------------------------------------------------------
------
>mleisher@crl.nmsu.edu
>Mark Leisher                         "The trick is not gaining the 
knowledge,
>Computing Research Lab                    but surviving the lessons."
>New Mexico State University                  -- "Svaha," Charles de 
Lint
>Box 30001, Dept. 3CRL
>Las Cruces, NM  88003
>
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT