Non-ascii string processing?

From: Theodore H. Smith (delete@elfdata.com)
Date: Sat Oct 04 2003 - 12:31:20 CST


Hi lists,

I'm wondering how people tend to do their non-ascii string processing.

I'm wondering, if anyone really needs anything other than byte oriented
code? I'm using UTF8 as my character format, and UTF8 is variable
width, of course. I offer the option of processing UTF8, with byte
functions, however.

EG:

Start = MyString.InStr( "<" )
End = MyString.InStr( Start + 1, "> )

things like this, it really doesn't matter if your data is UTF8, you
can still process it like bytes! Leading to faster speed, and simpler
code.

So, I'm wondering, in fact, is there ANY code that needs explicit UTF8
processing? Heres a few I've thought of.

1) Spell checking - needs UTF8 character based iteration
2) lexical processing - needs UTF8 mode to be able to match "å" to "a".

Can anyone tell me any more? Please feel free to go into great detail
in your answers. The more detail the better.

Thanks a lot!

I'm just wondering if I can simplify my string processing library, and
if anyone really needs anything except byte-level processing, for most
functions, except maybe a few for the two I mentioned above!



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST