Re: Unicode & space in programming & l10n

From: Jefsey_Morfin (jefsey@jefsey.com)
Date: Tue Sep 26 2006 - 20:49:54 CST

  • Next message: Mark Cilia Vincenti: "RE: Problem with SSI and BOM"

    Philippe,
    computers "speak" for 50 years only. Please give them time. Semantic
    processing is for them to speak in words and languages. Several
    systems translate texts to concepts, process concepts and then come
    back to a language. The "English bias" is not there. It is when you
    must use an English layer between the binary/universalised and
    lingualized layers, ex: you are to translate a French part in English
    and then in French for us to dialogye (like now).

    When you write a program in C you use declared formulas. You can
    easily transform a C language program in another script or change its
    semantic (not as easily its syntax).

    The problem is when you send a mail. The protocol will not work if
    you do not use "From:", "To:". We have the same problem with the
    domain names. This is why punycode is interesting for the sole DNS,
    as it transparently transforms most of ISO 10646 strings into
    hexatridecimals (0-Z) used by the DNS. Unfortunately e-mail LHS (left
    hand side) does not use Hexatridecimal.

    I agree that an important English bias is the standardization and
    documentation in English (even when it is bi-lingual like for ISO
    639-3, a sparated publication, make the French standard look like a
    translation, while one language should be here to clarify the
    ambiguity resulting from the specifics of the oher language - with
    the confusion which may result). This has several negative impacts.
    The IQ of a person is higher in his mother tongue. This deprives most
    of the world of an equal opportunity in accessing ICTs (their IETF
    standardisation, design, usage, and management are under "influence"
    [RFC 3935] of core values and concepts directly resulting from the
    sole English based cultures). Another problem is that English is a
    complex language to automatically understand. This increases the cost
    of the English globalization or reduce its possibilities.

    To technically (computers, networks) start addressing languages one
    must first identify languages as brain to brain interinteligibility
    protocols, correctly understand the lingual planes (universalisation,
    lingualisation, globalization, multilingualisation), describe their
    channelisation through their various modes (vocal, signs, icons,
    written, typed, computable, networked), consider the extended
    services to assist them, their conceptualisation capabilities, their
    standardization, their localization, their grapheme/phoneme elements,
    etc. Each time there is a rigidity due to the obligation to use a
    given language in this process (whatever the language) you have a
    linguistic bias.

    jfc

    At 04:36 19/09/2006, Philippe Verdy wrote:

    >From: "Doug Ewell" <dewell@adelphia.net>
    > > To the extent that C and database
    > > development tools exhibit a "bias" (which the passage does not prove),
    > > it is a bias in favor of 8-bit legacy encodings and not the English
    > > language.
    >
    >Given that most new technology terms are created and documented
    >first in English, the bias still exists as English is a
    >technology-related language; most programming languages are
    >developed with very uniform English terminology in their reserved
    >keywords which are nearly universal in languages using these
    >concepts; it seems that the only wellknown exception is APL, which
    >uses many symbols, but many of these symbols come from an actual
    >Maths language used in notating formulas, and there's a bias because
    >it uses Greek letters to differentiate it from variable names which
    >can then hardly be noted in native Greek (so here again there's a
    >bias in favor of Latinscript+Maths symbols).
    >
    >Attempts to translate programming languages have failed mostly for
    >interoperability reasons (consider for example the attempt to
    >translate the Excel function names in workingsheets), or because of
    >insufficient documentation and users (there are tons of computing
    >languages created with reserved keywords in other languages than
    >English, but then there's the problem of inputing these programs on
    >international systems); there's apparently still no computer
    >language that can translate its keywords with a symbolic declaration
    >area (which may be included as part of the program source, so that
    >it does not conflict with national identifiers).
    >
    >It should be possible to have such short declaration of the reserved
    >set of keywords in a simple leading source line like #[fr] which
    >indicates the reserved set of keywords used in French here (then
    >it's upto the compiler or interpreter to have a list of supported
    >languages, identified by a standard language tag), but then:
    >* who will maintain the language list for reserved identifiers?
    >* how can a mapping be performed without developing a language for
    >these associations? Can this meta-language be really international
    >and not culturally oriented?
    >
    >With Unicode-encoded texts (using one of the UTF, and optionally
    >some Unicode normalized forms that the compiler should treat as
    >equivalent to avoid problems created by input methods), we could as
    >well support international languages in sources including in
    >identifiers, without creating conflicts with reserved keywords. but
    >is it really needed, given that most programmers are trained to be
    >able to read and understand English-written technical documentation
    >for related computing concepts?
    >
    >And finally, how can you avoid the English orientation, when all
    >languages have to be integrated on systems using more and more
    >libraries written by others, and most often documented only in few
    >languages (English and sometimes another language).
    >
    >So as long as the many new English terms for techologies are not
    >adapted for almost all concepts, will programming languages be
    >written using only Basic Latin letters for identifiers; this tends
    >to fix the language encoding to the Basaic Latin range for English,
    >and this persists then in the way other variable names (and function
    >names from libraries) and the whole technical syntaxto use only the
    >Basic Latin alphabet. In the interm, attemps to make artificial
    >constructed technical languages will fail, and ASCII will remain as
    >the only alternative.



    This archive was generated by hypermail 2.1.5 : Tue Sep 26 2006 - 20:55:35 CST