Fw: Unicode & space in programming & l10n

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Sep 21 2006 - 16:38:54 CDT

  • Next message: Philippe Verdy: "Re: Unicode & space in programming & l10n"

    From: "William J Poser" <wjposer@ldc.upenn.edu>
    > I'm confused as to the sense in which C and C++
    > "don't support the Unicode character model". It is
    > very easy to manipulate objects of type wchar_t,
    > arrays thereof, linked lists thereof, and so forth.
    > I've done a fair amount of work using Unicode in C
    > and not found it a problem. There are some nice libraries
    > for handling Unicode in C, such as Ville Laurikari's
    > TRE regular expression library.

    I make a BIG difference between a language that SUPPORTS the model and one that HAS defined strong enough semantics; C/C++ datatypes were based on the pysical host architecture and not on the effective data model that people want when developing portable applications; this is the major cause of the complexity of porting C/C++ applications across platforms, and even the most "portable" libaries are in fact "ported" by adding lots of modifications and conditional macros that are alien to the effective semantics of the language.
    C/C++ was god for some time to elevate the level instead of having to write assembly programs that were completely impossible to port.

    Still, the language does not benefit of the rencent progress in dynamic compilation according to usage patterns, and the effective platform on which it runs. It severely lacks the strong semantics and due to its inherited compatibility, suffers of lots of caveats, and cannot prove any part of the correctness of the code.

    That's why I think that languages supported by strong datatype models, independant of the platform, and compiled to an intermediate code that will finally be optimized later and dynamically on the final target platform, is much more productive (and also results in much better performance, as the VM compiles the code only for the host on which it runs, and not for many platforms with which the program is intented).

    Supporting the Unicode character model effectively requires strong datatypes with fixed size integers, and even today, there's still no standard to support such datatypes; this becomes a real nightmare when implementing protocols, and becomes the source of bugs, due to fogotten cases, and many subtle assumptions in lots of programs.

    Reread the ANSI specs about "wchar_t": this weak datatype is definitely not the solution. Unfortunately, the <types.h> datatypes are not supported natively with strong datatype differences, but only as weak emulations. C/C++were designed assuming that programers know for which platform they are programming, and never in the spirit that these programs should be ported later on always eveolving platforms. The consequences are tremendous and in fact very costly in terms of maintenance, security, stability and documentation, because the implementation hides too much the intended design.

    The absence of a strong datatype for handling text in C/C++ is now a inherited weight that is hard to maintain longer, given that most of the data that we take much time to enter needs such strong datatypes. Even for multimedia contents, the absence of strong binary datatypes is a problem.

    When C/C+ was designed, computers were expensive, and memory rare. Programs could live the same time as the platform. This is no longer the case: computers and meory are cheap, but programs are much more complexand handle more data with lots more interactions with other systems. The programs are longer to develop and the code must survive longer than the platforms on which it will run.

    That's the meaning I give to the question: "Time to deprecate C/C++ ?" Here I mean that I strongly support the evolution to platform-independant programming based on a conceptual machine whose code will be adapted dynamically to the hardware platform on which it will run, by a small code designed specifically for the platform itself. Programmers should no longer have to worry about the background architecture. So welcome to the JVM (Java) and .Net (C#), including for most services offered today in our operating systems!

    This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 16:41:44 CDT