It's not just surrogates that make random access a problem.
A function like str.getAt(7) is still going to return a rather useless thing
if location 7 happens to contain a combining mark, for example.
Probably what you want for "real" text processing is a function that returns
something more like a text element or combining character sequence -- and
then walk the string in chunks of text elements.
Around here we have functions which, given an index into a string, return
the range including that location which covers the entire combining sequence
(base + combining marks). E.g., if you have X + acute + umlaut, and you give
it the location of the acute, it returns the range of length 3 including the
X, the acute, and the umlaut. Similarly, if you point at a member of a
surrogate pair, it returns the range containing the pair.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT