Re: Unicode & space in programming & l10n

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Sep 18 2006 - 21:36:26 CDT

  • Next message: Philippe Verdy: "Re: FW: Technology leads to cool fonts in Native language"

    From: "Doug Ewell" <dewell@adelphia.net>
    > To the extent that C and database
    > development tools exhibit a "bias" (which the passage does not prove),
    > it is a bias in favor of 8-bit legacy encodings and not the English
    > language.

    Given that most new technology terms are created and documented first in English, the bias still exists as English is a technology-related language; most programming languages are developed with very uniform English terminology in their reserved keywords which are nearly universal in languages using these concepts; it seems that the only wellknown exception is APL, which uses many symbols, but many of these symbols come from an actual Maths language used in notating formulas, and there's a bias because it uses Greek letters to differentiate it from variable names which can then hardly be noted in native Greek (so here again there's a bias in favor of Latinscript+Maths symbols).

    Attempts to translate programming languages have failed mostly for interoperability reasons (consider for example the attempt to translate the Excel function names in workingsheets), or because of insufficient documentation and users (there are tons of computing languages created with reserved keywords in other languages than English, but then there's the problem of inputing these programs on international systems); there's apparently still no computer language that can translate its keywords with a symbolic declaration area (which may be included as part of the program source, so that it does not conflict with national identifiers).

    It should be possible to have such short declaration of the reserved set of keywords in a simple leading source line like #[fr] which indicates the reserved set of keywords used in French here (then it's upto the compiler or interpreter to have a list of supported languages, identified by a standard language tag), but then:
    * who will maintain the language list for reserved identifiers?
    * how can a mapping be performed without developing a language for these associations? Can this meta-language be really international and not culturally oriented?

    With Unicode-encoded texts (using one of the UTF, and optionally some Unicode normalized forms that the compiler should treat as equivalent to avoid problems created by input methods), we could as well support international languages in sources including in identifiers, without creating conflicts with reserved keywords. but is it really needed, given that most programmers are trained to be able to read and understand English-written technical documentation for related computing concepts?

    And finally, how can you avoid the English orientation, when all languages have to be integrated on systems using more and more libraries written by others, and most often documented only in few languages (English and sometimes another language).

    So as long as the many new English terms for techologies are not adapted for almost all concepts, will programming languages be written using only Basic Latin letters for identifiers; this tends to fix the language encoding to the Basaic Latin range for English, and this persists then in the way other variable names (and function names from libraries) and the whole technical syntaxto use only the Basic Latin alphabet. In the interm, attemps to make artificial constructed technical languages will fail, and ASCII will remain as the only alternative.



    This archive was generated by hypermail 2.1.5 : Mon Sep 18 2006 - 21:43:35 CDT