RE: Non-ascii string processing?

From: Francois Yergeau (FYergeau@alis.com)
Date: Tue Oct 07 2003 - 12:23:09 CST


Marco Cimarosti wrote:
> As far as I understand, xsd:string is a list of "Character"-s, and a
> "Character" is an integer which can hold any valid Unicode code point.

Not quite. XML Schema points to XML for its definition of character, and
XML in turn says "A character is an atomic unit of text as specified by
ISO/IEC 10646". It's not a number, it's a piece of text that cannot be
further divided ("atomic").

> In other terms, xsd:string is necessarily in UTF-32 (or
> something close to it): it cannot be in UTF-8 or UTF-16.

xsd:string is encoding-form-independent, you can represent it in UTF:-)336
if you want.

> The numbers returned by length, minLength and maxLength are
> the actual, minimum and maximum number of *list elements*,
> contained in the list.

Yep, the number of characters in the "finite-length sequence of characters"
(XML Schema's definition of xsd:string).

> I.e., in the case of xsd:string, the *size* of the string in
> *encoding units*.

Nope. In characters.

-- 
François


This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST