Re: Unicode & space in programming & l10n

From: William J Poser (wjposer@ldc.upenn.edu)
Date: Wed Sep 20 2006 - 17:17:01 CDT

Next message: Kenneth Whistler: "Re: Unicode & space in programming & l10n"

Previous message: Addison Phillips: "Re: Question about formatting numerals"
Maybe in reply to: Don Osborn: "Unicode & space in programming & l10n"
Next in thread: Steve Summit: "Re: Unicode & space in programming & l10n"
Reply: Steve Summit: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I'm confused as to the sense in which C and C++
"don't support the Unicode character model". It is
very easy to manipulate objects of type wchar_t,
arrays thereof, linked lists thereof, and so forth.
I've done a fair amount of work using Unicode in C
and not found it a problem. There are some nice libraries
for handling Unicode in C, such as Ville Laurikari's
TRE regular expression library.

It is true that having to do your own storage allocation
can be a pain, but this is independent of the Unicode
issue - you have to deal with the same issues in plain
ASCII.

The main theoretical difficulty that I see with Unicode
processing in C is that you can't be sure that a wchar_t
is at least 21 bits wide. This is of course a general
defect of the C standard, which does not specify
object sizes. In practice, however, I haven't myself
encountered problems with this or heard of them.

For the present, at least, there is also a good reason
to use C IN PREFERENCE to high level languages for
processing Unicode, for some applications. The
high-level languages that I know of all limit
Unicode support to the BMP. That is true of Python
and Tcl, for example. In contrast, in C there
is no such limitation. Which high-level languages
currently handle the full Unicode range?

Next message: Kenneth Whistler: "Re: Unicode & space in programming & l10n"
Previous message: Addison Phillips: "Re: Question about formatting numerals"
Maybe in reply to: Don Osborn: "Unicode & space in programming & l10n"
Next in thread: Steve Summit: "Re: Unicode & space in programming & l10n"
Reply: Steve Summit: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Sep 20 2006 - 17:21:39 CDT