RE: Non-ascii string processing?

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Oct 07 2003 - 06:29:01 CST


Peter Kirk wrote:
> For i% = 1 to Len(utf8string$)
> c$ = Mid(utf8string$, i%, 1)
> Process c$
> Next i%
>
> Such a loop would be more efficient in UTF-32 of course, but this is
> still a real need for working with character counts.

If the string type and function of this Basic dialect is not Unicode-aware,
then:

- Len(s$) returns the number of *bytes* in the string;

- Mid(s$, i%, 1) returns a single *byte*;

- Your Process() subroutine won't work...

If the string type and functions are Unicode aware (as, e.g., in Visual
Basic or VBScript), then I'd expect that the actual internal representation
is hidden from the programmer, hence it makes no sense to talk about an
"UTF-8 string".

_ Marco



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST