Re: UNESCO standard keyboards? (Re: Tamazight/berber language : ....)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jun 06 2003 - 04:27:04 EDT

  • Next message: Marco Cimarosti: "RE: Tamazight/berber language : How to send mail, write word docu ments ...."

    From: <Peter_Constable@sil.org>
    > Don Osborn wrote on 06/05/2003 07:34:29 PM:
    > > > There are probably some existing standard for keyboard mappings,
    > promoted by
    > > > UNESCO and published in a ISO standard.
    > >
    > > If there were such a thing (for Tamazight or any other African
    > > language) I'd be
    > > very interested to know about it. My impression is that there are no
    > such
    > > standards for African languages that use extended Latin characters. In
    > fact
    > > SIL is apparently working with UNESCO on a proposed keyboard layout
    > > for African
    > > languages precisely because there is not yet any such standardization.
    >
    > Just for clarification, what we are doing is *not* part of a
    > standardization process; note that UNESCO is not a standards body. Rather,
    > UNESCO is involved in policy recommendations, and what we are assisting
    > them with some documents providing recommendations related to support of
    > the world's languages (emphasis on those languages on the other side of the
    > digital divide) in ICTs. The keyboard layout in question is merely for a
    > prototype implementation intended to demonstrate the ability to create
    > keyboard layouts to meet the needs of lesser-well-supported-in-ICTs
    > languages.

    Thanks for this information. However I don't think I stated that UNESCO was a standard body, but it is an important part of its activity to promote education and conservation of world cultures by helping cultures to be written and used on modern technologies.

    So I do think that for languages that are not supported by a country, most of these languages will in practive be written using the keyboard layouts already used in each country. Unicode can help those that want to create keyboard layouts and input methods, by describing somewhere the subset which is appropriate for each language and would facilitate their interchange, and the correct encoding of digrams/trigrams/polygrams with sequences of characters.

    If would be interesting to add some informative appendixes to Unicode and later make them normative, to clearly state the subset of characters that MUST be supported for each written language, and a list of legacy equivalents that should be interpreted the same as their recommanded encoding in the context of that language.

    Of course the recommanded characters should exclude compatibility characters (that will be listed in the legacy equivalences).
    After this step, there could exist statistics studies based on many types of publications, and published out of the Unicode standard, that list the combination properties of each letter or digram/trigram/polygram, with statistic indicators, allowing further identification of language.

    As Unicode has nearly finished its work on all major modern languages, such specification could already exist for them, but for more rare languages, it would help if they were encoded more explicitly with encoding guides, or recommendations, in order to facilitate their interchange. When I look in Unicode, there are often many candidates for the encoding of written languages with existing Unicode "abstract" characters.

    We spoke about the case of Breton <c'h> (which is encoded using the same set of characters as French, and so uses the APOSTROPHE and not the MODIFIER LETTER APOSTROPHE, simply because of input methods that use the French keyboard), or of the Tifinagh <gamma> (which could have been encoded in various texts with a Greek gamma, or with a Latin Gamma, with additional variants such as the LATIN SMALL LETTER GAMMA LITTLE CAPITAL...).

    The origin of some "compatibility characters" is not clear. They were certainly added because of legacy encodings, but if their use is not recommanded for newer languages, this usage should be better described: these characters will continue to be recommmanded to support national standards for specific languages (which already defined these distinct variants).

    For many remaining minority languages, there will certainly not exist a unique keyboard layout or input method. A model based on completely new keyboard layouts that do not correspond to any layout largely used in countries where these languages are used would probaly fail (people won't be able for example to find this keyboard or will be limited in their choice, notably for notebooks).

    In my opinion, Tifinagh or Breton will often be written using an extended French keyboard, but not from a completely new layout (simply because people need to use also their national language, and won't use a distinct computer or keyboard).
    The simplest way is then to modify or extend a national keyboard, even if this implies that a miniorty language will be supported by several input methods, one for each country).

    So I do think that a multinational Tifinagh keyboard will not exist, instead there will be keyboards for French+Tifinagh+Arabic or which may be distinct in Morocco, Tunisia, Algeria, Chad, or English+Tifinagh+Arabic in Lybia and Egypt, or English+French+Tifinagh in Canada...

    Keyboard layouts were initially created to support a single language in a single country. The way I see it will evolve is that OS will be more open and will allow each user to adapt its keyboard and input methods to the languages that the user wants to support, using a national base layout which is just extended to support more languages.

    Or simply the base language: look at the standard base French keyboard, two "characters" are still missing on it, the AE and OE ligatures (I say ligature and not letter for AE because this is how it is interpreted in French, but this is still needed because of orthographic, phonetic and grammatical rules which differentiate the ligature and the pairs of separate letters, notably for syllable breaks).

    Because of this absence, the origin orthograph is not respected, and this creates confusions and most texts are now encoded as separate letters (requiring to make additional ligatures for correct rendering, using dictionnaries or morphological analysis, so that a word like "coeur" will use the ligature, but not "coexister"). If the keyboards (created initially when the French set needed to fit in the ISO646 7bit model) contained those characters, more people would use those ligatures and interpret them as distinct letters. It's strange that French keyboards were slowly adapted to add new characters like the micro symbol that nearly nobody uses, or the newer Euro sign, but also forgot to define a standard position for OE and AE which are full members of the French alphabet...

    So if input methods and layout must be developed, it would be interesting to recommand how this should be done, and avoiding the same error as in the past: a language should be encoded with a preferable set of characters, and this must be reflected in the keyboard layouts, and standardized.

    The publication of recommanded alphabets (possibly in several "conformance levels") for each language would really help software and OS vendors to provide a richer set of input methods that can be learned and reused by people. A part of this job fits in Unicode because it defines character properties (conformance of a Unicode character to a language is such a property), and another part would be some other ISO working groups and Unesco recommandations.

    -- Philippe.



    This archive was generated by hypermail 2.1.5 : Fri Jun 06 2003 - 05:24:56 EDT