Re: Non-ascii string processing? - count display units

From: Markus Scherer (
Date: Tue Oct 07 2003 - 17:29:45 CST

You might want to look at East Asian Width for an approximation of
the green-screen width of a string.

To be absolutely precise, you need feedback from your green-screen layout engine and its font, of
course, like you do for a graphical display.


Edward H. Trager wrote:
>>What you really need for such a thing is a function which computes the
>>"width" of a string in terms of display units, rather than its length in
>>term of characters.
> Yes, I agree. I also need such a function. Do you, Marco, or anyone else, know which function(s)
> provide this service? (In my case, something Open Source or GPLed would be ideal, but ICU
> would be too heavy). My application started out life in a sheltered ASCII-only
> childhood, and now needs to move to the bigger UTF-8 world out there. Fortunately,
> it is quite capable of succeeding in that world, but I haven't even started working
> on the on-screen table formatting issue yet for exactly this reason.
> Actually I believe that if I have to write something myself, making it work for the
> Latin-with-combining-diacritics and CJK cases would not be too hard. After that however,
> it seems that one would have to work on a script-by-script basis to get it to really
> work properly. If it was only a case of Arabic, that would be one thing, but when one
> looks at the Indic and Indic-derived scripts ... well, there are a lot of Indic and Indic-derived
> scripts! Not that it is hard, but it would certainly take time, and I haven't done an ounce
> of research yet to find out whether somebody has done it already or not ...

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST