RE: Non-ascii string processing?

From: jon@spin.ie
Date: Mon Oct 06 2003 - 14:09:11 CST

Next message: Edward H. Trager: "Re: Non-ascii string processing?"
Previous message: Marco Cimarosti: "RE: Non-ascii string processing?"
Maybe in reply to: Theodore H. Smith: "Non-ascii string processing?"
Next in thread: Jill Ramonsky: "RE: Non-ascii string processing?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> But I still don't see any use in knowing how many characters are in an UTF-8
> string, apart the use that I already mentioned: allocating a buffer for a
> UTF-8 to UTF-32 conversion.

I wouldn't use it for that at all. I'd assume a worse-case of 32-bit word in the UTF-32 per octet in the UTF-8 or else stream it out, and hence avoid allocating a buffer for the entire string at all.

You would need to be able to count UTF-8 characters if you were implementing an spec defined in terms of characters rather than bytes, notably since XML is implemented in terms of characters any mention of string lengths or indices into strings is defined in terms of characters (e.g. in XSLT, XPointer and elsewhere).

Next message: Edward H. Trager: "Re: Non-ascii string processing?"
Previous message: Marco Cimarosti: "RE: Non-ascii string processing?"
Maybe in reply to: Theodore H. Smith: "Non-ascii string processing?"
Next in thread: Jill Ramonsky: "RE: Non-ascii string processing?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST