RE: Non-ascii string processing?

From: Jill Ramonsky (Jill.Ramonsky@aculab.com)
Date: Tue Oct 07 2003 - 05:35:56 CST


No. What you have demonstrated below is that given an API based on
characters, one can write an API based on default grapheme clusters.
Nonetheless, it is only the /_resulting _/default-grapheme-cluster-based
API which would actually be of any use to end-users.

...and anyone who even /thinks/ of writing an API based on default
grapheme clusters is surely competent enough to write that that (almost
trivial) character-based middle layer themselves.

I have yet to see an APPLICATION which needs a character-based API.
Jill

> -----Original Message-----
> From: Peter Kirk [mailto:peterkirk@qaya.org]
> Sent: Tuesday, October 07, 2003 12:20 PM
> To: Jill Ramonsky
> Cc: unicode@unicode.org
> Subject: Re: Non-ascii string processing?
>
>
> On 07/10/2003 02:35, Jill Ramonsky wrote:
>
> >
> > Knowing the number of characters won't help you one iota. What you
> > need to know here is the number of default grapheme clusters.
> > I still have yet to hear a useful purpose for counting the
> number of
> > /characters/.
> >
> > Jill
> >
> Suppose I have a UTF-8 string and want to know how many
> default grapheme
> clusters it contains. How do I do so? Well, I step through the string
> character by character, combining successive characters into grapheme
> clusters. To do this without having to decode the UTF-8
> myself, I need
> to be able to get at the string character by character, and
> very likely
> use a loop based on the number of characters in the string, e.g. the
> following Basic (horrid language but good for making my point here):
>
> For i% = 1 to Len(utf8string$)
> c$ = Mid(utf8string$, i%, 1)
> Process c$
> Next i%
>
> Such a loop would be more efficient in UTF-32 of course, but this is
> still a real need for working with character counts.
>
> --
> Peter Kirk
> peter@qaya.org (personal)
> peterkirk@qaya.org (work)
> http://www.qaya.org/
>



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST