RE: FAQ

From: Yves Arrouye (Yves@centraal.com)
Date: Fri May 21 1999 - 00:46:14 EDT


> Yes, that is what I said.
>
> "- If the storage is UTF-16, then UTF-16 indices are direct. To compute
UCS-4 indices you parse
> from the start of the text."
>
> Your example is UTF-16 text, so the UCS-4 indices are *not*
direct--accessing a random UCS-4 index
> requires scanning from the start of the text. Here are the direct UTF-16
indices, plus the UCS-4 indices
> computed by parsing from the start.
>
> text: s o m e <s1> <s2> t e x t <s1> <s2>
> UTF-16: 0 1 2 3 4 5 6 7 8 9 10 11 12
> UCS4: 0 1 2 3 4 5 6 7 8 9 10
>
> So the 8th UCS-4 code value is "x", while the 8th UTF-16 code value is
"e".
>
> Does that answer your question?

It does. I took the indexes as being machine-word-based (2 bytes for UTF-16,
4 for UCS-4), not character based. If they're character based, then yes the
access is direct though the mapping from the index to the actual range of
bytes representing the character is not.

Yves.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT