Re: Non-ascii string processing?

From: Edward H. Trager (ehtrager@umich.edu)
Date: Mon Oct 06 2003 - 11:11:07 CST


On Monday 2003.10.06 17:15:25 +0200, Marco Cimarosti wrote:
> Stephane Bortzmeyer wrote:
> > > OK. But the length in "characters" of a string is not
> > "character semantics":
> > > it's plain nonsense, IMHO.
> >
> > I disagree.
>
> Feel free.
>
> But I still don't see any use in knowing how many characters are in an UTF-8
> string, apart the use that I already mentioned: allocating a buffer for a
> UTF-8 to UTF-32 conversion.
>
> _ Marco

Well, I know a good use for it: a console or terminal-based application which
displays information using fixed-width fonts in a tabular form, such as a subset
of records from a database table. To calculate how wide to display each column, knowing the
maximum number of characters in the strings for each column is a useful starting
place.

Of course, that might not be enough by itself if, for example, (1) one has
to display Hanzi or Kanji which are twice the width of Latin characters when
displayed on a terminal, or (2) one has to display scripts where ligatures
(as in Arabic) or other attributes of the script, such as over-the-letter/
under-the-letter vowels in Indic and Indic-derived scripts, change the display
width of a string from what it would be if just counting characters. But it is
still a good place to start.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST