Re: Unicode & space in programming & l10n

From: William J Poser (wjposer@ldc.upenn.edu)
Date: Wed Sep 20 2006 - 17:17:01 CDT

  • Next message: Kenneth Whistler: "Re: Unicode & space in programming & l10n"

    I'm confused as to the sense in which C and C++
    "don't support the Unicode character model". It is
    very easy to manipulate objects of type wchar_t,
    arrays thereof, linked lists thereof, and so forth.
    I've done a fair amount of work using Unicode in C
    and not found it a problem. There are some nice libraries
    for handling Unicode in C, such as Ville Laurikari's
    TRE regular expression library.

    It is true that having to do your own storage allocation
    can be a pain, but this is independent of the Unicode
    issue - you have to deal with the same issues in plain
    ASCII.

    The main theoretical difficulty that I see with Unicode
    processing in C is that you can't be sure that a wchar_t
    is at least 21 bits wide. This is of course a general
    defect of the C standard, which does not specify
    object sizes. In practice, however, I haven't myself
    encountered problems with this or heard of them.

    For the present, at least, there is also a good reason
    to use C IN PREFERENCE to high level languages for
    processing Unicode, for some applications. The
    high-level languages that I know of all limit
    Unicode support to the BMP. That is true of Python
    and Tcl, for example. In contrast, in C there
    is no such limitation. Which high-level languages
    currently handle the full Unicode range?
     



    This archive was generated by hypermail 2.1.5 : Wed Sep 20 2006 - 17:21:39 CDT