RE: Non-ascii string processing?

From: jon@spin.ie
Date: Mon Oct 06 2003 - 14:09:11 CST


> But I still don't see any use in knowing how many characters are in an UTF-8
> string, apart the use that I already mentioned: allocating a buffer for a
> UTF-8 to UTF-32 conversion.

I wouldn't use it for that at all. I'd assume a worse-case of 32-bit word in the UTF-32 per octet in the UTF-8 or else stream it out, and hence avoid allocating a buffer for the entire string at all.

You would need to be able to count UTF-8 characters if you were implementing an spec defined in terms of characters rather than bytes, notably since XML is implemented in terms of characters any mention of string lengths or indices into strings is defined in terms of characters (e.g. in XSLT, XPointer and elsewhere).



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST